Automated Preparation for Molecule Search Databases

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Description

This floe automatically prepares FastROCS, Gigadock, molecule search collections, and molecule search databases given a SMILES, CXSMILES, CSV, or SDF file or files, OR intermediate collections to start from a later point. It also supports ending the job before full completion of molecule search databases.

Promoted Parameters

Title in user interface (promoted name)

Prep Input

Input file (in): Input molecule file. If included, this will be used as the starting point for both the 2D and 3D prep, unless an alternative input is provided for one of them, or they are explicitly told not to run.

  • Type: file_in

Prep Automation Options

Ending point for 2D prep (end_2d): If you want to end the 2D prep at an earlier point than setting up the search DB, modify this.

  • Type: string

  • Default: 2d search DB

  • Choices: [‘Chunked collection’, ‘2d search collection’, ‘2d search DB’]

Ending point for 3D prep (end_3d): If you want to end the 3D prep at an earlier point than setting up the search DB, modify this.

  • Type: string

  • Default: 3d search DB

  • Choices: [‘FastROCS collection’, ‘3d search collection’, ‘3d search DB’]

Execute 2D prep (prep_2d): Whether or not to run 2D collection and database prep floes

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Execute 3D prep (prep_3d): Whether or not to run 3D collection and database prep floes

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Keep 2D DB loaded (loaded_2d): Whether or not to keep the 2D Molecule Search database in the LOADED state

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Keep 3D DB loaded (loaded_3d): Whether or not to keep the 3D Molecule Search database in the LOADED state

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Identifier Strings

Custom Library Name (vendor_library): Name of custom library. This is the name that will appear on the database.

  • Required

  • Type: string

Custom Library Version (cust_version): Versioning for the custom library. This parameter is required.

  • Required

  • Type: string

Gigaprep Parameter Group

Molecule title (gprep_moltitle): String corresponding to column title in the .csv file containing the molecule title, to be passed as an input argument to Gigaprep floe.

  • Type: string

  • Default:

Use GPU Omega? (gprep_useGPUOmega):

  • Type: boolean

  • Default: True

  • Choices: [True, False]

GPU Omega Instance Types (gprep_GPUOmega_hardware): Currently, cdns is turned off by default due to a bandwidth issue with larger input data. Turn on for small inputs

  • Type: string

  • Default: !cdns-g1,!g4dn.12xlarge,!g5.12xlarge,!g6.12xlarge,!g6e.12xlarge

OEFilter Type (gprep_OEFilterType):

  • Type: string

  • Default: None

Delete Gigadock Collection (delete_gigadock_coll): If your only goal is to build a molecule search database, and you do not plan to use the intermediate Gigadock collection, leave this on to delete it.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

2D Database Parameters

2D Database CPUs (cpu_2d):

  • Type: integer

2D Database GPUs (gpu_2d):

  • Type: integer

2D Database Memory (2d_memory):

  • Type: decimal

2D Database Disk Space (diskspace_2d):

  • Type: decimal

2D Database Instance (2d_instance):

  • Type: string

3D Database Parameters

3D Database CPUs (cpu_3d):

  • Type: integer

3D Database GPUs (gpu_3d):

  • Type: integer

3D Database Memory (3d_memory):

  • Type: decimal

3D Database Disk Space (diskspace_3d):

  • Type: decimal

3D Database Instance (3d_instance):

  • Type: string

Logging Datasets

Failed Dataset (data_out1): Output dataset of failed calculations.

  • Required

  • Type: dataset_out

  • Default: Failed Output for Automated Preparation for Molecule Search Databases

Output Dataset (data_out2): Output dataset of successful calculations

  • Required

  • Type: dataset_out

  • Default: Output for Automated Preparation for Molecule Search Databases