Automated WEMD Simulation and Best Structure Search Guided By Eigen CryoEM Maps
Description
This floe sets up and starts a new WEMD simulation to sample the diverse conformational states of a protein using projection coefficients onto cryoEM eigenmaps (from any principle component analysis (PCA) based conformational heterogeneity analysis method) as progress coordinates. After simulation, it also searches for the best structures for each of the input cryoEM maps based on the agreement between the simulated structures and input maps which are obtained from clustering and heterogeneity analyses of 2D images. This is an automated floe to run the WEMD simulation and analysis in an integrated process. Longer simulations using more advanced options or continuing a stopped simulation should use the Continue WEMD Simulation Guided by Cryo-EM Maps Floe. Caution: This floe can be expensive for large systems, even when the default settings are used (more than $1000 for systems with greater than 1000 residues).
Promoted Parameters
Title in user interface (promoted name)
Input Data for Simulation
Solvated and Equilibrated Design Unit Input Dataset (prep_data_in): The output dataset from WEMD Preparation containing all the information necessary to start a production WEMD simulation. This comes from the Cryptic Pocket Detection package.
Required
Type: data_source
Output Data for Simulation
Output Dataset (simulation_data_out): Name of the output dataset. This dataset contains the current iteration number, simulation settings, and the reference design unit. This dataset contains only one record and can be viewed on the Orion Analyze page to track the total number of iterations that have been completed.
Required
Type: dataset_out
Default: wemd_simulation_dataset
Collection Name (wemd_collection_name): Name of the collection created to store MD segments and other related files. The same collection will be used for continuing a WEMD simulation and to perform analysis.
Required
Type: collection_sink
Default: wemd_simulation_collection
Progress Coordinate Inputs
Cryo-EM Map Resolution (pcoord_resolution): Resolution of input cryo-EM map (angstroms) for constructing progress coordinate(s).
Required
Type: decimal
Default: 2.0
Reference Protein Dataset after SPRUCE Preparation (pcoord_ref_dataset): Input dataset of reference protein for constructing progress coordinate(s). (Structure must be fit to the input cryo-EM map.)
Required
Type: data_source
Superposition method (pcoord_superpose_method): Superposition method to fit simulation structure to reference structure of cryo-EM map for constructing progress coordinate(s).
Required
Type: string
Default: GlobalSequence
Choices: [‘GlobalSequence’, ‘SiteSequence’, ‘DDMatrix’, ‘SSE’, ‘SiteHopper’]
Selected Components for Simulation Map (pcoord_mask_components): Selected structure components to create simulation map for constructing progress coordinate(s).
Required
Type: string
Default: [‘protein’, ‘nucleic’, ‘ligand’]
Choices: [‘protein’, ‘nucleic’, ‘ligand’, ‘solvent’, ‘metals’, ‘counter_ions’, ‘lipids’, ‘packing_residues’, ‘sugars’, ‘undefined’, ‘cofactors’, ‘excipients’, ‘polymers’, ‘post_translational’, ‘other_proteins’, ‘other_nucleics’, ‘other_ligands’, ‘other_cofactors’]
Resize Cryo-EM Map(s) (pcoord_resize_map): Toggle on to resize cryo-EM map(s) using reference structure(s) with 10 angstroms of padding.
Type: boolean
Default: True
Choices: [True, False]
Mean Cryo-EM Map (mean_map): Input file for the mean map (in .mrc file format).
Required
Type: file_in
Eigen Map(s) (eigen_maps): Input file(s) for the eigen map(s) (in .mrc file format).
Required
Type: file_in
Masking Control (pcoord_masking_control): Select to add masking to input cryo-EM maps for construction of progress coordinates.
Type: boolean
Default: True
Choices: [True, False]
Percentile of CryoEM Densities for Normalization (pcoord_masking_threshold): Top percentile of densities used to normalize mean and std values of simulation maps to those from experimental mean maps for constructing progress coordinates.
Type: decimal
Default: 98
Weighted Ensemble Parameters
Iteration Interval (Tau) (iteration_interval): Length of each WE iteration in picoseconds.
Type: decimal
Default: 10
Iterations (iterations): Total cumulative number of iterations for the WE simulation.
Required
Type: integer
Default: 100
WEMD Bin Settings
Binning Options (bins_in): YAML file specifying the binning options. The default configuration uses Minimal Adaptive Binning (MAB) with 10 bins along each progress coordinate dimension.
Type: file_in
MAB Default Bin Number (mab_nbins): When the default MAB binning scheme is applied, this parameter determines the number of bins along each progress coordinate dimension. This parameter is ignored when a “Binning Options” file is provided.
Required
Type: integer
Default: 10
Advanced Weighted Ensemble Parameters
Number of Frames Per Iteration (frames_per_iteration): Number of frames saved in each WE iteration.
Type: integer
Default: 5
Trajectories Per Bin (walkers_per_bin): Number of trajectories per bin for the WEMD simulation.
Required
Type: integer
Default: 5
Smallest Allowed Weight (Log) (smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.
Required
Type: decimal
Default: -310.0
Orion Settings for Cost-optimization of WEMD
Spot policy (spot_policy): Control cube placement on spot market instances
Type: string
Default: Required
Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]
Instance Type (instance_type): The type of instance that this cube needs to be run on
Type: string
Maximum number of Parallel Cubes for MD (max_parallel): The maximum number of concurrently running copies of this Cube
Type: integer
Default: 1000
Simulation Report Settings
Floe Report (report_out): The title for the output floe report.
Required
Type: string
Default: Start WEMD Simulation and Structure Search Report
Number of iterations for averaged distributions (average_window): Number of iterations to generate averaged density distributions for calculating the Kullback-Leibler divergence in each dimension.
Type: integer
Default: 2
Number of Bins for Histogram (n_bins): Number of bins for plotting density distribution in each dimension.
Type: integer
Default: 50
Option for reference PDF (ref_type): Set type of reference probability distribution function for KL divergence.
Type: string
Default: Accumulated
Choices: [‘Accumulated’, ‘Averaged’]
Input Cryo-EM Maps and Options for Best Structures Search
Cryo-EM Map Resolution (resolution): Resolution of the input cryo-EM maps (angstroms).
Required
Type: decimal
Default: 2.0
Cryo-EM Map Files (map_files): Input cryo-EM map files for best structure search (in .mrc format).
Required
Type: file_in
Reference Protein Dataset after SPRUCE Preparation (ref_dataset): Input Dataset for Reference Protein (structure must be fit to the input cryo-EM maps).
Required
Type: data_source
Superposition method (superpose_method): Superposition method to fit simulation structure to reference structure of cryo-EM map.
Required
Type: string
Default: GlobalSequence
Choices: [‘GlobalSequence’, ‘SiteSequence’, ‘DDMatrix’, ‘SSE’, ‘SiteHopper’]
Selected Components for Cryo-EM Map (mask_components): Selected components to generate simulated maps from simulation trajectories.
Required
Type: string
Default: [‘protein’, ‘nucleic’, ‘ligand’]
Choices: [‘protein’, ‘nucleic’, ‘ligand’, ‘solvent’, ‘metals’, ‘counter_ions’, ‘lipids’, ‘packing_residues’, ‘sugars’, ‘undefined’, ‘cofactors’, ‘excipients’, ‘polymers’, ‘post_translational’, ‘other_proteins’, ‘other_nucleics’, ‘other_ligands’, ‘other_cofactors’]
Resize Cryo-EM Map(s) (resize_map): Toggle on to resize cryo-EM map(s) using reference structure(s) with 10 angstroms of padding.
Type: boolean
Default: True
Choices: [True, False]
Masking options for RSCC
Masking Control (masking_control): Toggle on to mask simulated and reference cryo-EM maps.
Type: boolean
Default: True
Choices: [True, False]
Masking Type for Cryo-EM Maps (masking_type): Options for how to combine the masks for trajectory and reference cryo-EM map: “union”, use the union mask from two maps; “overlap”, use the overlap mask between two maps; “reference”, use only the mask from reference map; “trajectory”, use only the masks from trajectory.
Type: string
Default: trajectory
Choices: [‘union’, ‘overlap’, ‘reference’, ‘trajectory’]
Masking Scheme for Reference Cryo-EM Map (masking_scheme_ref): Options for masking reference cryo-EM map: “all”, include everything; “threshold”, density value below the specified value will be masked out; “std”, density value below the specified standard deviations above the mean will be masked out.
Type: string
Default: threshold
Choices: [‘all’, ‘threshold’, ‘std’]
Masking Scheme for Simulated Cryo-EM Map (masking_scheme_traj): Options for masking reference cryo-EM map: “all”, include everything; “threshold”, density value below the specified value will be masked out; “std”, density value below the specified standard deviations above the mean will be masked out.
Type: string
Default: threshold
Choices: [‘all’, ‘threshold’, ‘std’]
Masking Value for Reference Cryo-EM Map (masking_level_ref): Input value for mask definition: for “threshold”, value below which to mask out; for “std”, standard deviations above the mean below which to mask out.
Type: decimal
Default: 0.0
Masking Value for Simulated Cryo-EM Map (masking_level_traj): Input value for mask definition: for “threshold”, value below which to mask out, for “std”, standard deviation above mean below which to mask out.
Type: decimal
Default: 0.005
Selections for Output Best Structures
Superpose Control (report_superpose_ctrl): Toggle on to superpose trajectory onto input reference structure.
Required
Type: boolean
Default: True
Choices: [True, False]
Number of Top Candidates for Saving (ntop_out): Number of top candidates for each map to save to dataset.
Required
Type: integer
Default: 5
Output Best Structures for Input Cryo-EM Maps (struct_data_out): Output dataset saves the top N simulation structures for each input cryo-EM maps.
Required
Type: dataset_out
Default: best_structures_dataset
Selection Range for Trajectories
Start Iteration (first_iter): The start iteration of WE segments to be extracted. Leave empty to extract from the first iteration that is available.
Type: integer
End Iteration (last_iter): The end iteration of WE segments to be extracted. Leave empty to stop extracting at the last iteration that is available.
Type: integer
Endpoint Only (endpoint_only): By default, each WEMD segment has five conformers saved. If True, only the last one is used; if False, all five are used.
Required
Type: boolean
Default: True
Choices: [True, False]
Analysis Stride (N) (stride): Skip every N frames during analysis. No valid when the endpoint-only option is on.
Type: integer
Default: 1
Parameters for Estimating Free Energy Maps/Surfaces
Weight Free Energy Calculation (weight_option): Toggle on to apply WEMD weights when generating free energy maps (default=True).
Type: boolean
Default: True
Choices: [True, False]
Density Distribution Bins (number_bins): Number of bins for plotting density distribution in each dimension.
Type: integer
Default: 50
Maximum value of free energy (max_free_energy): Maximum value for showing estimated 2D free energy map in colors.
Type: decimal
Default: 50.0
Outputs for Cryo-EM Map Analysis
Failure Report (fail_report): Output report to generate upon failure.
Type: string
Default: Protein Sampling Failure Report
None (analysis_report_title): Title for the analysis report.
Type: string
Default: Cryo-EM Map Match Report
Output Features for Each WEMD Segment (analysis_data_out): Output dataset saves calculated features from all WEMD segments within selected ranges.
Required
Type: dataset_out
Default: output_features_dataset