Automated WEMD Simulation and Best Structure Search Guided By Eigen CryoEM Maps

Description

This floe sets up and starts a new WEMD simulation to sample the diverse conformational states of a protein using projection coefficients onto cryoEM eigenmaps (from any principle component analysis (PCA) based conformational heterogeneity analysis method) as progress coordinates. After simulation, it also searches for the best structures for each of the input cryoEM maps based on the agreement between the simulated structures and input maps which are obtained from clustering and heterogeneity analyses of 2D images. This is an automated floe to run the WEMD simulation and analysis in an integrated process. Longer simulations using more advanced options or continuing a stopped simulation should use the Continue WEMD Simulation Guided by Cryo-EM Maps Floe. Caution: This floe can be expensive for large systems, even when the default settings are used (more than $1000 for systems with greater than 1000 residues).

Promoted Parameters

Title in user interface (promoted name)

Input Data for Simulation

Solvated and Equilibrated Design Unit Input Dataset (prep_data_in): The output dataset from WEMD Preparation containing all the information necessary to start a production WEMD simulation. This comes from the Cryptic Pocket Detection package.

  • Required

  • Type: data_source

Output Data for Simulation

Output Dataset (simulation_data_out): Name of the output dataset. This dataset contains the current iteration number, simulation settings, and the reference design unit. This dataset contains only one record and can be viewed on the Orion Analyze page to track the total number of iterations that have been completed.

  • Required

  • Type: dataset_out

  • Default: wemd_simulation_dataset

Collection Name (wemd_collection_name): Name of the collection created to store MD segments and other related files. The same collection will be used for continuing a WEMD simulation and to perform analysis.

  • Required

  • Type: collection_sink

  • Default: wemd_simulation_collection

Progress Coordinate Inputs

Cryo-EM Map Resolution (pcoord_resolution): Resolution of input cryo-EM map (angstroms) for constructing progress coordinate(s).

  • Required

  • Type: decimal

  • Default: 2.0

Reference Protein Dataset after SPRUCE Preparation (pcoord_ref_dataset): Input dataset of reference protein for constructing progress coordinate(s). (Structure must be fit to the input cryo-EM map.)

  • Required

  • Type: data_source

Superposition method (pcoord_superpose_method): Superposition method to fit simulation structure to reference structure of cryo-EM map for constructing progress coordinate(s).

  • Required

  • Type: string

  • Default: GlobalSequence

  • Choices: [‘GlobalSequence’, ‘SiteSequence’, ‘DDMatrix’, ‘SSE’, ‘SiteHopper’]

Selected Components for Simulation Map (pcoord_mask_components): Selected structure components to create simulation map for constructing progress coordinate(s).

  • Required

  • Type: string

  • Default: [‘protein’, ‘nucleic’, ‘ligand’]

  • Choices: [‘protein’, ‘nucleic’, ‘ligand’, ‘solvent’, ‘metals’, ‘counter_ions’, ‘lipids’, ‘packing_residues’, ‘sugars’, ‘undefined’, ‘cofactors’, ‘excipients’, ‘polymers’, ‘post_translational’, ‘other_proteins’, ‘other_nucleics’, ‘other_ligands’, ‘other_cofactors’]

Resize Cryo-EM Map(s) (pcoord_resize_map): Toggle on to resize cryo-EM map(s) using reference structure(s) with 10 angstroms of padding.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Mean Cryo-EM Map (mean_map): Input file for the mean map (in .mrc file format).

  • Required

  • Type: file_in

Eigen Map(s) (eigen_maps): Input file(s) for the eigen map(s) (in .mrc file format).

  • Required

  • Type: file_in

Masking Control (pcoord_masking_control): Select to add masking to input cryo-EM maps for construction of progress coordinates.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Percentile of CryoEM Densities for Normalization (pcoord_masking_threshold): Top percentile of densities used to normalize mean and std values of simulation maps to those from experimental mean maps for constructing progress coordinates.

  • Type: decimal

  • Default: 98

Weighted Ensemble Parameters

Iteration Interval (Tau) (iteration_interval): Length of each WE iteration in picoseconds.

  • Type: decimal

  • Default: 10

Iterations (iterations): Total cumulative number of iterations for the WE simulation.

  • Required

  • Type: integer

  • Default: 100

WEMD Bin Settings

Binning Options (bins_in): YAML file specifying the binning options. The default configuration uses Minimal Adaptive Binning (MAB) with 10 bins along each progress coordinate dimension.

  • Type: file_in

MAB Default Bin Number (mab_nbins): When the default MAB binning scheme is applied, this parameter determines the number of bins along each progress coordinate dimension. This parameter is ignored when a “Binning Options” file is provided.

  • Required

  • Type: integer

  • Default: 10

Advanced Weighted Ensemble Parameters

Number of Frames Per Iteration (frames_per_iteration): Number of frames saved in each WE iteration.

  • Type: integer

  • Default: 5

Trajectories Per Bin (walkers_per_bin): Number of trajectories per bin for the WEMD simulation.

  • Required

  • Type: integer

  • Default: 5

Smallest Allowed Weight (Log) (smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.

  • Required

  • Type: decimal

  • Default: -310.0

Orion Settings for Cost-optimization of WEMD

Spot policy (spot_policy): Control cube placement on spot market instances

  • Type: string

  • Default: Required

  • Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]

Instance Type (instance_type): The type of instance that this cube needs to be run on

  • Type: string

Maximum number of Parallel Cubes for MD (max_parallel): The maximum number of concurrently running copies of this Cube

  • Type: integer

  • Default: 1000

Simulation Report Settings

Floe Report (report_out): The title for the output floe report.

  • Required

  • Type: string

  • Default: Start WEMD Simulation and Structure Search Report

Number of iterations for averaged distributions (average_window): Number of iterations to generate averaged density distributions for calculating the Kullback-Leibler divergence in each dimension.

  • Type: integer

  • Default: 2

Number of Bins for Histogram (n_bins): Number of bins for plotting density distribution in each dimension.

  • Type: integer

  • Default: 50

Option for reference PDF (ref_type): Set type of reference probability distribution function for KL divergence.

  • Type: string

  • Default: Accumulated

  • Choices: [‘Accumulated’, ‘Averaged’]

Input Cryo-EM Maps and Options for Best Structures Search

Cryo-EM Map Resolution (resolution): Resolution of the input cryo-EM maps (angstroms).

  • Required

  • Type: decimal

  • Default: 2.0

Cryo-EM Map Files (map_files): Input cryo-EM map files for best structure search (in .mrc format).

  • Required

  • Type: file_in

Reference Protein Dataset after SPRUCE Preparation (ref_dataset): Input Dataset for Reference Protein (structure must be fit to the input cryo-EM maps).

  • Required

  • Type: data_source

Superposition method (superpose_method): Superposition method to fit simulation structure to reference structure of cryo-EM map.

  • Required

  • Type: string

  • Default: GlobalSequence

  • Choices: [‘GlobalSequence’, ‘SiteSequence’, ‘DDMatrix’, ‘SSE’, ‘SiteHopper’]

Selected Components for Cryo-EM Map (mask_components): Selected components to generate simulated maps from simulation trajectories.

  • Required

  • Type: string

  • Default: [‘protein’, ‘nucleic’, ‘ligand’]

  • Choices: [‘protein’, ‘nucleic’, ‘ligand’, ‘solvent’, ‘metals’, ‘counter_ions’, ‘lipids’, ‘packing_residues’, ‘sugars’, ‘undefined’, ‘cofactors’, ‘excipients’, ‘polymers’, ‘post_translational’, ‘other_proteins’, ‘other_nucleics’, ‘other_ligands’, ‘other_cofactors’]

Resize Cryo-EM Map(s) (resize_map): Toggle on to resize cryo-EM map(s) using reference structure(s) with 10 angstroms of padding.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Masking options for RSCC

Masking Control (masking_control): Toggle on to mask simulated and reference cryo-EM maps.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Masking Type for Cryo-EM Maps (masking_type): Options for how to combine the masks for trajectory and reference cryo-EM map: “union”, use the union mask from two maps; “overlap”, use the overlap mask between two maps; “reference”, use only the mask from reference map; “trajectory”, use only the masks from trajectory.

  • Type: string

  • Default: trajectory

  • Choices: [‘union’, ‘overlap’, ‘reference’, ‘trajectory’]

Masking Scheme for Reference Cryo-EM Map (masking_scheme_ref): Options for masking reference cryo-EM map: “all”, include everything; “threshold”, density value below the specified value will be masked out; “std”, density value below the specified standard deviations above the mean will be masked out.

  • Type: string

  • Default: threshold

  • Choices: [‘all’, ‘threshold’, ‘std’]

Masking Scheme for Simulated Cryo-EM Map (masking_scheme_traj): Options for masking reference cryo-EM map: “all”, include everything; “threshold”, density value below the specified value will be masked out; “std”, density value below the specified standard deviations above the mean will be masked out.

  • Type: string

  • Default: threshold

  • Choices: [‘all’, ‘threshold’, ‘std’]

Masking Value for Reference Cryo-EM Map (masking_level_ref): Input value for mask definition: for “threshold”, value below which to mask out; for “std”, standard deviations above the mean below which to mask out.

  • Type: decimal

  • Default: 0.0

Masking Value for Simulated Cryo-EM Map (masking_level_traj): Input value for mask definition: for “threshold”, value below which to mask out, for “std”, standard deviation above mean below which to mask out.

  • Type: decimal

  • Default: 0.005

Selections for Output Best Structures

Superpose Control (report_superpose_ctrl): Toggle on to superpose trajectory onto input reference structure.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Number of Top Candidates for Saving (ntop_out): Number of top candidates for each map to save to dataset.

  • Required

  • Type: integer

  • Default: 5

Output Best Structures for Input Cryo-EM Maps (struct_data_out): Output dataset saves the top N simulation structures for each input cryo-EM maps.

  • Required

  • Type: dataset_out

  • Default: best_structures_dataset

Selection Range for Trajectories

Start Iteration (first_iter): The start iteration of WE segments to be extracted. Leave empty to extract from the first iteration that is available.

  • Type: integer

End Iteration (last_iter): The end iteration of WE segments to be extracted. Leave empty to stop extracting at the last iteration that is available.

  • Type: integer

Endpoint Only (endpoint_only): By default, each WEMD segment has five conformers saved. If True, only the last one is used; if False, all five are used.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Analysis Stride (N) (stride): Skip every N frames during analysis. No valid when the endpoint-only option is on.

  • Type: integer

  • Default: 1

Parameters for Estimating Free Energy Maps/Surfaces

Weight Free Energy Calculation (weight_option): Toggle on to apply WEMD weights when generating free energy maps (default=True).

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Density Distribution Bins (number_bins): Number of bins for plotting density distribution in each dimension.

  • Type: integer

  • Default: 50

Maximum value of free energy (max_free_energy): Maximum value for showing estimated 2D free energy map in colors.

  • Type: decimal

  • Default: 50.0

Outputs for Cryo-EM Map Analysis

Failure Report (fail_report): Output report to generate upon failure.

  • Type: string

  • Default: Protein Sampling Failure Report

None (analysis_report_title): Title for the analysis report.

  • Type: string

  • Default: Cryo-EM Map Match Report

Output Features for Each WEMD Segment (analysis_data_out): Output dataset saves calculated features from all WEMD segments within selected ranges.

  • Required

  • Type: dataset_out

  • Default: output_features_dataset