CHOMP - Generate BROOD Fragment Database

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/BROOD

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search

  • Task-based/Scaffold-Hopping

Description

CHOMP - Generate BROOD Fragment Database is a utility to the lead generation tool BROOD. CHOMP allows users to fragment molecules, filter the fragments, generate 3D conformations, organize and index the fragments for rapid searching, and write a Brood database.

The minimal input into CHOMP is a dataset, file, or collection of 2D molecules.

The output from the CHOMP floe is a BROOD Database collection that can be used as input for the BROOD floe. The CHOMP floe also optionally produces a tarball database file that can be used to run the BROOD application on a local machine. Please ensure that the Output database name does not contain spaces; otherwise, the floe will fail.

The CHOMP floe requires high memory and disk machines, at different stages, based on the input. The default values for these parameters have been set to handle up to ~1 million drug-like molecules as input. For larger jobs, these cube parameters would need to be scaled up. It is recommended that you adjust the control parameters following the below guidelines, before starting a job.

  • Chunk Size: Set the chunk size so the number of chunks is ~250. For example, for ~1 million drug-like molecules, the suggested chunk size is the default value of 4000.

  • Memory (MiB) (Chomp builder): Multiply the default memory value by the ratio of change in chunk size. For example, if the default chunk size is doubled, multiply the default memory by 2.

  • Memory (MiB) (Chomp Merge Fragments): Set this value to ~0.01 times the number of fragments.

  • Memory (MiB) (Chomp DB Generator): Set this value to ~0.01 times the number of fragments.

  • Temporary Disk Space (MiB) (Chomp DB Generator): Set this value to ~0.01 times the number of fragments.

Please note that these guidelines are approximate, and the specific values may differ for each input. A recommended 10% increase in memory and disk space values is advised to provide a margin of safety.

Promoted Parameters

Title in user interface (promoted name)

Input parameters

Input dataset (in_dataset): Input dataset containing molecules or user fragments.

  • Type: data_source

Input file (in_file): Input file containing molecules or user fragments.

  • Type: file_in

Input collection (in_collection): Input collection containing molecules or user fragments.

  • Type: collection_source

Save BROOD Database Tarfile (save_db_file): Boolean flag indicating whether or not to save the BROOD database tarfile

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Output parameters

Brood Fragments DB Collection (out_collection): Output collection containing fragments database.

  • Required

  • Type: collection_sink

  • Default: BROOD Fragments DB collection

Output database name (out_db): Output BROOD database name.

  • Required

  • Type: file_out

  • Default: brood_database

Failed Dataset (failed): Output dataset of failed calculations.

  • Required

  • Type: dataset_out

  • Default: Failed Output for CHOMP - Generate BROOD Fragment Database

Fragment filtering parameters

Max Heavy Atoms (maxHvy): Maximum number of heavy atoms allowed in fragments.

  • Type: integer

  • Default: 15

Heavy Frags Min Frequency (minFrequency): Minimum frequency required for a heavy fragment to be retained. 0 indicates to accept everything.

  • Type: integer

  • Default: 3

Use Percentile (usePercentile): Boolean flag indicating whether or not to treat the minimum frequency input as a percentile

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Heavy Fragment Size (minFreqHeavy): Minimum number of heavy atoms per fragment, that defines a heavy fragment, beyond which the minimum frequency is applicable

  • Type: integer

  • Default: 9

Control parameters

Chunk Size (chunk_size): The chunk size for splitting records.

  • Type: integer

  • Default: 4000

Memory (MiB) (memory_builder): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 29491.2

Memory (MiB) (memory_generator): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 58982.4

Memory (MiB) (memory_merger): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 58982.4

Temporary Disk Space (MiB) (disk_space): The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 58982.4