Using the Recovery Dataset for MD Floes

The Recovery Dataset (default name recovery_dataset) is a helpful output that can be used by the MD Affinity Floes to continue unfinished jobs. This dataset is created after a cycle or iteration has been completed in the MD floes. Cycles are dictated by the Cube Max Run Time parameter. This parameter is set to one hour by default. After each cycle, the number of MD steps that have been completed are output into a file named cycle_n_traj, and the recovery dataset is updated with the latest MD data. If this parameter is lowered, the recovery dataset will be updated more frequently and more cycle_n_traj files will be written out; however, fewer MD steps will be performed per cycle. This parameter cannot be increased past 11 hours without significant risk of job failure because of the AWS 12 hour time limit.

Recovery dataset in Orion

Figure 1. Recovery dataset in Orion.

Note

The run time and number of MD steps per cycle are highly dependent upon the size of your flask (system). If the default value (1 hour) is used for the Cube Max Run Time parameter and the number of atoms of your flask (system) is smaller than 200,000, it sets the per-cycle runtime shorter than 1 hour to have shorter per-cycle runtimes for small flasks.

When a job is nearing completion (the calculated number of cycles is completed for the number of MD steps), the data in the recovery dataset will be overwritten to an output dataset (e.g., md_bound_output dataset). If the Parallel Iteration Checker and MD Stage Concatenation Cube has finished running, then the recovery dataset will not be useful. However, if a job is canceled, terminated, or failed before this cube has finished running, then the following steps can be performed.

Note

While the recovery dataset is being overwritten, cycle_n_traj files may also be combined together into a comb_traj file if the cumulative cycle_n_traj files do not exceed 10 GB. This is to ensure that all files are easily downloadable.

The recovery dataset can be used as input for the Plain Molecular Dynamics Floe to continue MD simulations. All parameters that were set in the previous floe that created the recovery dataset will be maintained for the Plain MD Floe.

Note

The recovery dataset output from the STMD Floe currently only holds information for the bound MD simulation. It does not contain information for the unbound simulation.

Example case:

A job is submitted to run for 50 ns; the number of cycles/iterations that are needed to complete the MD simulation based on the size of the system and other parameters is calculated to be 50 cycles. After 1 cycle, the recovery dataset has been built and continues to grow after each iteration. The job reaches a cost limit after 30 cycles, so it is terminated. The recovery dataset can be used as input for the Plain MD Floe and all of the original job settings will be run and override any default settings of the Plain MD Floe. The job will run to the completion of the 50 iterations (starting at the 30th iteration). This outputs an md output dataset. If running for an additional 50 ns is desired, the md output dataset can be used as the input for the Plain MD Floe. Entirely new settings are necessary for the run because MD parameters and settings are not maintained in the md_bound_output dataset.