Automated Preparation for Molecule Search Databases
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Description
This floe automatically prepares FastROCS, Gigadock, molecule search collections, and molecule search databases given a SMILES, CXSMILES, CSV, or SDF file or files, OR intermediate collections to start from a later point. It also supports ending the job before full completion of molecule search databases.
Promoted Parameters
Title in user interface (promoted name)
Prep Input
Input file (in): Input molecule file. If included, this will be used as the starting point for both the 2D and 3D prep, unless an alternative input is provided for one of them, or they are explicitly told not to run.
Type: file_in
Prep Automation Options
Ending point for 2D prep (end_2d): If you want to end the 2D prep at an earlier point than setting up the search DB, modify this.
Type: string
Default: 2d search DB
Choices: [‘Chunked collection’, ‘2d search collection’, ‘2d search DB’]
Ending point for 3D prep (end_3d): If you want to end the 3D prep at an earlier point than setting up the search DB, modify this.
Type: string
Default: 3d search DB
Choices: [‘FastROCS collection’, ‘3d search collection’, ‘3d search DB’]
Execute 2D prep (prep_2d): Whether or not to run 2D collection and database prep floes
Type: boolean
Default: True
Choices: [True, False]
Execute 3D prep (prep_3d): Whether or not to run 3D collection and database prep floes
Type: boolean
Default: True
Choices: [True, False]
Keep 2D DB loaded (loaded_2d): Whether or not to keep the 2D Molecule Search database in the LOADED state
Type: boolean
Default: False
Choices: [True, False]
Keep 3D DB loaded (loaded_3d): Whether or not to keep the 3D Molecule Search database in the LOADED state
Type: boolean
Default: False
Choices: [True, False]
Identifier Strings
Custom Library Name (vendor_library): Name of custom library. This is the name that will appear on the database.
Required
Type: string
Custom Library Version (cust_version): Versioning for the custom library. This parameter is required.
Required
Type: string
Gigaprep Parameter Group
Molecule title (gprep_moltitle): String corresponding to column title in the .csv file containing the molecule title, to be passed as an input argument to Gigaprep floe.
Type: string
Default:
Use GPU Omega? (gprep_useGPUOmega):
Type: boolean
Default: True
Choices: [True, False]
GPU Omega Instance Types (gprep_GPUOmega_hardware): Currently, cdns is turned off by default due to a bandwidth issue with larger input data. Turn on for small inputs
Type: string
Default: !cdns-g1,!g4dn.12xlarge,!g5.12xlarge,!g6.12xlarge,!g6e.12xlarge
OEFilter Type (gprep_OEFilterType):
Type: string
Default: None
Delete Gigadock Collection (delete_gigadock_coll): If your only goal is to build a molecule search database, and you do not plan to use the intermediate Gigadock collection, leave this on to delete it.
Type: boolean
Default: True
Choices: [True, False]
2D Database Parameters
2D Database CPUs (cpu_2d):
Type: integer
2D Database GPUs (gpu_2d):
Type: integer
2D Database Memory (2d_memory):
Type: decimal
2D Database Disk Space (diskspace_2d):
Type: decimal
2D Database Instance (2d_instance):
Type: string
3D Database Parameters
3D Database CPUs (cpu_3d):
Type: integer
3D Database GPUs (gpu_3d):
Type: integer
3D Database Memory (3d_memory):
Type: decimal
3D Database Disk Space (diskspace_3d):
Type: decimal
3D Database Instance (3d_instance):
Type: string
Logging Datasets
Failed Dataset (data_out1): Output dataset of failed calculations.
Required
Type: dataset_out
Default: Failed Output for Automated Preparation for Molecule Search Databases
Output Dataset (data_out2): Output dataset of successful calculations
Required
Type: dataset_out
Default: Output for Automated Preparation for Molecule Search Databases