Automated Preparation for Molecule Search Databases

Description

Automatically prepares FastROCS collections, Gigadock collections, Molecule Search collections, and Molecule Search databases given any of the following file formats: .smi, .ism, .cxsmiles, .csv, or .sdf. Additionally, intermediate collections can be used to start from a later point. Also supports ending before full preparation of Molecule Search databases.

Promoted Parameters

Title in user interface (promoted name)

Prep input

Input file (in): Input molecule file. Must be one of the following formats: .smi, .ism, .cxsmiles, .csv, or .sdf. If included, this will be used as the starting point for both the 2D and 3D prep, unless an alternative input is provided for one of them, or they are explicitly told not to run.

  • Type: file_in

Prep Automation Options

Ending point for 2D prep (end_2d): If you want to end the 2D prep at an earlier point than setting up the search DB, modify this.

  • Type: string

  • Default: 2D search DB

  • Choices: [‘Chunked collection’, ‘2D search collection’, ‘2D search DB’]

Ending point for 3D prep (end_3d): If you want to end the 3D prep at an earlier point than setting up the search DB, modify this.

  • Type: string

  • Default: 3D search DB

  • Choices: [‘FastROCS collection’, ‘3D search collection’, ‘3D search DB’]

Execute 2D prep (prep_2d): Whether or not to run 2D collection and database prep floes

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Execute 3D prep (prep_3d): Whether or not to run 3D collection and database prep floes

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Keep 2D DB loaded (loaded_2d): Whether or not to keep the 2D Molecule Search database in the LOADED state

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Keep 3D DB loaded (loaded_3d): Whether or not to keep the 3D Molecule Search database in the LOADED state

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Identifier strings

Custom Library Name (vendor_library): Name of custom library. This is the name that will appear on the database.

  • Required

  • Type: string

Custom Library Version (cust_version): Versioning for the custom library. This parameter is required.

  • Required

  • Type: string

Gigaprep parameter group

Title Field (mol_title): For .csv inputs. Uses given column name to select titles of input molecules. Will assume titles are located in default column if left blank.

  • Type: string

Use GPU Omega? (gprep_useGPUOmega):

  • Type: boolean

  • Default: True

  • Choices: [True, False]

GPU Omega Instance Types (gprep_GPUOmega_hardware): Validated list of GPU instances for Omega.

  • Type: string

  • Default: !cdns-g1,!g4dn.12xlarge,!g5.12xlarge,!g6.12xlarge,!g6e.12xlarge

OEFilter Type (gprep_OEFilterType):

  • Type: string

  • Default: BlockBuster

  • Choices: [‘BlockBuster’, ‘Lead’, ‘Drug’, ‘PAINS’, ‘None’]

Delete Gigadock Collection (delete_gigadock_coll): If your only goal is to build a molecule search database, and you do not plan to use the intermediate Gigadock collection, leave this on to delete it.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

2D Database Parameters

2D Database CPUs (cpu_2d):

  • Type: integer

2D Database GPUs (gpu_2d):

  • Type: integer

2D Database Memory (2d_memory):

  • Type: decimal

2D Database Disk Space (diskspace_2d):

  • Type: decimal

2D Database Instance (2d_instance):

  • Type: string

3D Database Parameters

3D Database CPUs (cpu_3d):

  • Type: integer

3D Database GPUs (gpu_3d):

  • Type: integer

3D Database Memory (3d_memory):

  • Type: decimal

3D Database Disk Space (diskspace_3d):

  • Type: decimal

3D Database Instance (3d_instance):

  • Type: string

Logging datasets

Failed Dataset (data_out1): Output dataset of failed calculations.

  • Required

  • Type: dataset_out

  • Default: Failed Output for Automated Preparation for Molecule Search Databases

Output Dataset (data_out2): Output dataset of successful calculations

  • Required

  • Type: dataset_out

  • Default: Output for Automated Preparation for Molecule Search Databases