CHOMP - Generate BROOD Fragment Database

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/BROOD

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search

  • Task-based/Scaffold-Hopping

Description

CHOMP - Generate BROOD Fragment Database is a utility to the lead generation tool BROOD. CHOMP allows users to fragment molecules, filter the fragments, generate 3D conformations, organize and index the fragments for rapid searching, and write a Brood database.

The minimal input into CHOMP is a dataset, file, or collection of 2D molecules.

The output from the CHOMP floe is a BROOD Database collection that can be used as input for the BROOD floe. The CHOMP floe also optionally produces a tarball database file that can be used to run the BROOD application on a local machine. Please ensure that the Output database name does not contain spaces; otherwise, the floe will fail.

The CHOMP floe requires high memory and disk machines, at different stages, based on the input. The default values for these parameters have been set to handle up to ~1 million drug-like molecules as input. For larger jobs, these cube parameters would need to be scaled up. It is recommended that you adjust the control parameters following the below guidelines, before starting a job.

  • Chunk Size: Set the chunk size so the number of chunks is ~250. For example, for ~1 million drug-like molecules, the suggested chunk size is the default value of 4000.

  • Memory (MiB) (Chomp Fragments): Multiply the default memory value by the ratio of change in chunk size. For example, if the default chunk size is doubled, multiply the default memory by 2.

  • Memory (MiB) (Chomp Builder): Set this value to ~0.002 times the number of fragments.

  • Memory (MiB) (Chomp DB Generator): Set this value to ~0.01 times the number of fragments.

  • Temporary Disk Space (MiB) (Chomp DB Generator): Set this value to ~0.01 times the number of fragments.

Please note that these guidelines are approximate, and the specific values may differ for each input. A recommended 10% increase in memory and disk space values is advised to provide a margin of safety.

Promoted Parameters

Title in user interface (promoted name)

Input parameters

Input dataset (in_dataset): Input dataset containing molecules or user fragments.

  • Type: data_source

Input file (in_file): Input file containing molecules or user fragments.

  • Type: file_in

Input collection (in_collection): Input collection containing molecules or user fragments.

  • Type: collection_source

Output parameters

Brood Fragments DB Collection (out_collection): Output collection containing fragments database.

  • Required

  • Type: collection_sink

  • Default: BROOD Fragments DB collection

Save BROOD Database Tarfile (save_db_file): Boolean flag indicating whether or not to save the BROOD database tarfile

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Output database name (out_db): Output BROOD database name.

  • Required

  • Type: file_out

  • Default: brood_database

Write 2D Fragments output dataset (write_2d_frags): Whether or not to write 2D Fragments output dataset

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Output 2D Dataset (out_2d): Output dataset of 2D Fragments

  • Required

  • Type: dataset_out

  • Default: Output of CHOMP - 2D Fragments

Failed Dataset (failed): Output dataset of failed calculations.

  • Required

  • Type: dataset_out

  • Default: Failed Output for CHOMP - Generate BROOD Fragment Database

Fragment generation and filtering parameters

SMARTS (smarts): SMARTS definition for bonds to break

  • Type: string

  • Default: all

  • Choices: [‘recap’, ‘rlf’, ‘both’, ‘all’]

Custom SMARTS File (smarts_file): Custom SMARTS file with definition for bonds to breaking

  • Type: file_in

Filter (filter): Flag if the fragment filter to be applied

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Custom Filter File (filter_file): Custom Filter file for fragments filtering

  • Type: file_in

Maximum Heavy (max_heavy): Maximum number of heavy atoms per fragment

  • Type: integer

  • Default: 15

Heavy Frags Min Frequency (minFrequency): Minimum number of source molecules a fragment must contain in

  • Type: integer

  • Default: 0

Heavy Fragment Size (minFreqHeavy): Minimum number of heavy atoms per fragment, beyond which the minimum frequency is applicable

  • Type: integer

  • Default: 9

Control parameters

Chunk Size (chunk_size): The chunk size for splitting records.

  • Type: integer

  • Default: 4000

Memory (MiB) (memory_builder): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 3686.4

Memory (MiB) (memory_generator): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 14745.6

Memory (MiB) (memory_merger): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 58982.4

Temporary Disk Space (MiB) (disk_space): The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 58982.4