Input Files

Typically, a user will find it most convenient to generate the BROOD query using the graphical interface. Even if one wants to run BROOD on a cluster or other machine where it isn’t convenient to use the graphical interface, the graphical interface can be used to generate the necessary input and then BROOD can be executed on a different machine after moving the input files. BROOD requires one input file, a query fragment file. With a default installation, the BROOD application can search either one of the default fragment database or a user-generated database.

Query Files

The entire query is encapsulated into a single query file, typically generated in the GUI. When generated, the GUI names the file brood_1_query.oeb.gz where the 1 is replaced with the next appropriate integer to maintain unique file names. This BROOD query includes the original molecule, the fragment that the user has chosen to replace, the user-edited description of the shape, chemistry and electrostatics of the query. In addition, if the query includes protein structures either for bump-check or selectivity-check, these too will be encapsulated in the query file.

BROOD carries out a 3D search, and a 3D search is required for the search and is usually included in the query file. The BROOD GUI can take either a 2D or a 3D query file. If the original input is 2D, a 3D structure will be generated in the GUI and used during the search.

Fragment Database

As mentioned in the theory section, BROOD comes with a pregenerated, multiconformer fragment database. This database is made from fragments of known molecules that contain 1-15 heavy atoms and 1-3 attachment sites.

BROOD can also search user-generated fragment files. A custom database can be created using CHOMP.

Example Fragment Files

The database file used for BROOD searches is in a highly optimized format with many pre-calculated properties and a complex structure intended to minimize disk access during the time-critical search. These database files can be generated by the CHOMP program included with your BROOD distribution. By default, CHOMP will generate fragments from complete molecules. On the other hand, if a user has their own fragment files they would like to search, these can be loaded directly into the CHOMP program and used to generate a searchable database.

In this case input database fragment for bioisostere searching must be a molecular fragment with one or more “attachment points”. An attachment point represents a bond from the fragment to the rest of the molecule. Inside OEChem TK, this is represented as a bond to an atom with atomic number 0 (termed a “Dummy Atom”). If it is necessary to uniquely distinguish these dummy atoms, Map Indices can be used via the OEAtomBase::SetMapIdx() and OEAtomBase::GetMapIdx() api.

The input fragment does not need to have 3D coordinates (they can, optionally, be generated on-the-fly). This makes SMILES and 2D SDF file formats the most convenient for query fragment input. On the other hand, the fragment database should contain 3D coordinates with multiple conformers per fragment. In the examples below, a simple amide fragment with generic (amide) and uniquely labeled (amideR) dummy atoms are shown.

SMILES format:

*C(=O)N* amide
[*:1]C(=O)N[*:2] amideR

SDF format:

amide
  -OEChem-01040614232D

  5  4  0     0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 *   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 *   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
  4  5  1  0  0  0  0
M  END
$$$$
amideR
  -OEChem-01040614232D

  5  4  0     0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 R#  0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 R#  0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
  4  5  1  0  0  0  0
M  RGP  2   1   1   5   2
M  END
$$$$

Molecular File Formats

BROOD can read and write a variety of molecular file formats. The file format is automatically interpreted from the filename suffix.

File Type Extension
SMILES .smi .ism .can .smi.gz .ism.gz .can.gz
SDF .sdf .mol .sdf.gz .mol.gz
SKC .skc .skc.gz
CDK .cdk .cdk.gz
MOL2 .mol2 .mol2.gz
PDB .pdb .ent .pdb.gz .ent.gz
MacroModel .mmod .mmod.gz
OEBinary v2 .oeb .oeb.gz

Gzipped OEBinary version 2 (oeb.gz) is the recommended output format.

BROOD is capable of piping formatted input and output. The simple “-” can be used in place of a filename to indicate std::cin or std::cout with the default SMILES format.

prompt> brood -in .oeb.gz -db myDB < brood.query_1.oeb.gz

This execution will run BROOD with std::cin as the input with .oeb.gz format. The format is controlled by the suffix.