Currently, the ACE3P extension has basic support for connecting to the Cori super computer at NERSC, to run a simulation (Job) and report a list of jobs ran. However, the existing functionality is limited in a number of ways. With the advent of the new Projects system, it is time to re-engineer Jobs management and enable some key features. Most importantly is the concept of jobs provenance, or the capacity to immediately save all the requisite information needed to reproduce the results of a given simulation (Job). Currently, the strategy for implementation is as follows:
- Define a Job class which will be implemented as a thin wrapper around a JSON object. The JSON object will store all metadata associated with a Job.
- Add a list (std::vector for now) of Jobs objects to the Project class. which will be serialized something like this:
{
"Jobs": [
{
"SLURM_ID": "023411",
"CUMULUS_ID": "5e239ab2341082398021c"
"MACHINE": "Cori",
"JOB_NAME": "Run 10",
"ANALYSIS_STEP"="Omega3P",
"NODES": 6,
"PROCESSES": 144,
"RUNTIME": 28800,
"NOTES": "Here is where we type notes",
"ANALYSIS_UUID"="b886560a-e609-4dbb-a7c5-1b582f9e2773",
"ANALYSIS_URL": "path/to/the/saved/analysis.smtk"
},
{
"SLURM_ID": "3741238",
"CUMULUS_ID": "5e239ab2318947109802d"
"MACHINE": "Cori",
"JOB_NAME": "Rent",
"ANALYSIS_STEP"="Tmp3P",
"NODES": 12,
"PROCESSES": 288,
"RUNTIME": 525600,
"NOTES": "Added more cups of coffee",
"ANALYSIS_UUID"="75ed5514-daae-4775-96db-3f0916174d15",
"ANALYSIS_URL": "path/to/the/saved/other_analysis.smtk"
}
]
}
to a standalone file saved to the top level of the project folder. The list of metadata is minimal and will be expanded as needed (fortunately, the JSON object under the hood of the Job class will make extensibility easy).
- When a Job is created (by exporting and submitting a job to an HPC resource), this list will be appended to and a snapshot of the progenitor Analysis (represented internally as an Attribute Resource) will be saved to a Jobs Data subdirectory of the project folder with a unique name. This will provide provenance so that if an Analysis is edited after a job has been submitted from that Analysis, we will still have a copy of all the data required to recreate that job. This Analysis artifact can then be stored for later use. Saving this artifact will be optional.
This topic has a lot of hooks to the ongoing conversation on Resource versioning, and this proposed implementation certainly feels brute force, potentially saving entire Attribute Resources to file over and over. On the other hand, having direct access to the exact data used to generate that job would massively streamline simulation provenance.