Automated Preparation for Molecule Search Databases
Description
Automatically prepares FastROCS collections, Gigadock collections, Molecule Search collections, and Molecule Search databases given any of the following file formats: .smi, .ism, .cxsmiles, .csv, or .sdf. Additionally, intermediate collections can be used to start from a later point. Also supports ending before full preparation of Molecule Search databases.
Promoted Parameters
Title in user interface (promoted name)
Prep input
Input file (in): Input molecule file. Must be one of the following formats: .smi, .ism, .cxsmiles, .csv, or .sdf. If included, this will be used as the starting point for both the 2D and 3D prep, unless an alternative input is provided for one of them, or they are explicitly told not to run.
Type: file_in
Prep Automation Options
Ending point for 2D prep (end_2d): If you want to end the 2D prep at an earlier point than setting up the search DB, modify this.
Type: string
Default: 2D search DB
Choices: [‘Chunked collection’, ‘2D search collection’, ‘2D search DB’]
Ending point for 3D prep (end_3d): If you want to end the 3D prep at an earlier point than setting up the search DB, modify this.
Type: string
Default: 3D search DB
Choices: [‘FastROCS collection’, ‘3D search collection’, ‘3D search DB’]
Execute 2D prep (prep_2d): Whether or not to run 2D collection and database prep floes
Type: boolean
Default: True
Choices: [True, False]
Execute 3D prep (prep_3d): Whether or not to run 3D collection and database prep floes
Type: boolean
Default: True
Choices: [True, False]
Keep 2D DB loaded (loaded_2d): Whether or not to keep the 2D Molecule Search database in the LOADED state
Type: boolean
Default: False
Choices: [True, False]
Keep 3D DB loaded (loaded_3d): Whether or not to keep the 3D Molecule Search database in the LOADED state
Type: boolean
Default: False
Choices: [True, False]
Identifier strings
Custom Library Name (vendor_library): Name of custom library. This is the name that will appear on the database.
Required
Type: string
Custom Library Version (cust_version): Versioning for the custom library. This parameter is required.
Required
Type: string
Gigaprep parameter group
Title Field (mol_title): For .csv inputs. Uses given column name to select titles of input molecules. Will assume titles are located in default column if left blank.
Type: string
Use GPU Omega? (gprep_useGPUOmega):
Type: boolean
Default: True
Choices: [True, False]
GPU Omega Instance Types (gprep_GPUOmega_hardware): Validated list of GPU instances for Omega.
Type: string
Default: !cdns-g1,!g4dn.12xlarge,!g5.12xlarge,!g6.12xlarge,!g6e.12xlarge
OEFilter Type (gprep_OEFilterType):
Type: string
Default: BlockBuster
Choices: [‘BlockBuster’, ‘Lead’, ‘Drug’, ‘PAINS’, ‘None’]
Delete Gigadock Collection (delete_gigadock_coll): If your only goal is to build a molecule search database, and you do not plan to use the intermediate Gigadock collection, leave this on to delete it.
Type: boolean
Default: True
Choices: [True, False]
2D Database Parameters
2D Database CPUs (cpu_2d):
Type: integer
2D Database GPUs (gpu_2d):
Type: integer
2D Database Memory (2d_memory):
Type: decimal
2D Database Disk Space (diskspace_2d):
Type: decimal
2D Database Instance (2d_instance):
Type: string
3D Database Parameters
3D Database CPUs (cpu_3d):
Type: integer
3D Database GPUs (gpu_3d):
Type: integer
3D Database Memory (3d_memory):
Type: decimal
3D Database Disk Space (diskspace_3d):
Type: decimal
3D Database Instance (3d_instance):
Type: string
Logging datasets
Failed Dataset (data_out1): Output dataset of failed calculations.
Required
Type: dataset_out
Default: Failed Output for Automated Preparation for Molecule Search Databases
Output Dataset (data_out2): Output dataset of successful calculations
Required
Type: dataset_out
Default: Output for Automated Preparation for Molecule Search Databases