CHOMP - Generate BROOD Fragment Database

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Product-based/BROOD

Role-based/Computational Chemist

Solution-based/Virtual-screening/DB Search

Task-based/Scaffold-Hopping

Description

CHOMP - Generate BROOD Fragment Database is a utility to the lead generation tool BROOD. CHOMP allows users to fragment molecules, filter the fragments, generate 3D conformations, organize and index the fragments for rapid searching, and write a Brood database.

The minimal input into CHOMP is a dataset, file, or collection of 2D molecules.

The output from the CHOMP floe is a BROOD Database collection that can be used as input for the BROOD floe. The CHOMP floe also optionally produces a tarball database file that can be used to run the BROOD application on a local machine. Please ensure that the Output database name does not contain spaces; otherwise, the floe will fail.

The CHOMP floe requires high memory and disk machines, at different stages, based on the input. The default values for these parameters have been set to handle up to ~1 million drug-like molecules as input. For larger jobs, these cube parameters would need to be scaled up. It is recommended that you adjust the control parameters following the below guidelines, before starting a job.

Chunk Size: Set the chunk size so the number of chunks is ~250. For example, for ~1 million drug-like molecules, the suggested chunk size is the default value of 4000.
Memory (MiB) (Chomp Fragments): Multiply the default memory value by the ratio of change in chunk size. For example, if the default chunk size is doubled, multiply the default memory by 2.
Memory (MiB) (Chomp Builder): Set this value to ~0.002 times the number of fragments.
Memory (MiB) (Chomp DB Generator): Set this value to ~0.01 times the number of fragments.
Temporary Disk Space (MiB) (Chomp DB Generator): Set this value to ~0.01 times the number of fragments.

Please note that these guidelines are approximate, and the specific values may differ for each input. A recommended 10% increase in memory and disk space values is advised to provide a margin of safety.

Promoted Parameters

Title in user interface (promoted name)

Input parameters

Input dataset (in_dataset): Input dataset containing molecules or user fragments.

Type: data_source

Input file (in_file): Input file containing molecules or user fragments.

Type: file_in

Input collection (in_collection): Input collection containing molecules or user fragments.

Type: collection_source

Output parameters

Brood Fragments DB Collection (out_collection): Output collection containing fragments database.

Required

Type: collection_sink

Default: BROOD Fragments DB collection

Save BROOD Database Tarfile (save_db_file): Boolean flag indicating whether or not to save the BROOD database tarfile

Type: boolean

Default: False

Choices: [True, False]

Output database name (out_db): Output BROOD database name.

Required

Type: file_out

Default: brood_database

Write 2D Fragments output dataset (write_2d_frags): Whether or not to write 2D Fragments output dataset

Required

Type: boolean

Default: False

Choices: [True, False]

Output 2D Dataset (out_2d): Output dataset of 2D Fragments

Required

Type: dataset_out

Default: Output of CHOMP - 2D Fragments

Failed Dataset (failed): Output dataset of failed calculations.

Required

Type: dataset_out

Default: Failed Output for CHOMP - Generate BROOD Fragment Database

Fragment generation and filtering parameters

SMARTS (smarts): SMARTS definition for bonds to break

Type: string

Default: all

Choices: [‘recap’, ‘rlf’, ‘both’, ‘all’]

Custom SMARTS File (smarts_file): Custom SMARTS file with definition for bonds to breaking

Type: file_in

Filter (filter): Flag if the fragment filter to be applied

Type: boolean

Default: True

Choices: [True, False]

Custom Filter File (filter_file): Custom Filter file for fragments filtering

Type: file_in

Maximum Heavy (max_heavy): Maximum number of heavy atoms per fragment

Type: integer

Default: 15

Heavy Frags Min Frequency (minFrequency): Minimum number of source molecules a fragment must contain in

Type: integer

Default: 0

Heavy Fragment Size (minFreqHeavy): Minimum number of heavy atoms per fragment, beyond which the minimum frequency is applicable

Type: integer

Default: 9

Control parameters

Chunk Size (chunk_size): The chunk size for splitting records.

Type: integer

Default: 4000

Memory (MiB) (memory_builder): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Type: decimal

Default: 3686.4

Memory (MiB) (memory_generator): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Type: decimal

Default: 14745.6

Memory (MiB) (memory_merger): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Type: decimal

Default: 58982.4

Temporary Disk Space (MiB) (disk_space): The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Type: decimal

Default: 58982.4