Output Files

BROOD generates seven output files in addition to the hitlist. All of these files begin with the prefix specified by the -prefix flag, that by default is “brood”. They also include an integer to retain unique file names in the working directory (in this example, we will use the integer “1”). These files include:

  1. brood_1_info.txt, the info file (This file is only created if the run is initiated from the GUI.)

  2. brood_1_log.txt, the log file

  3. brood_1_parm.txt, the param file

  4. brood_1_hitlist_rpt.csv, the hitlist report file

  5. brood_1_removed_rpt.csv, the removed hits report file

By default, there is one hitlist and the contents of the hitlist will be determined by the command flags.

  • brood_1_hitlist.oeb.gz

In postprocessing of the search, some fragments are removed from the hitlist. These are preserved in the “removed” file.

  • brood_1_removed.oeb.gz

If the -scoreType et flag is used, the hitlist will include molecules electrostatically similar to the query.

Before executing the search, the GUI also writes the encapsulated query into the query file.

  • brood_1_query.oeb.gz

Info File

By default, an information file titled brood_1_info.txt is written or updated for every 500 fragments that are processed. Reading this info file allows a user to monitor the progress of the database search while it is occurring and it serves as a record of the performance of that search after the execution is completed. During execution, the GUI uses this file as well as direct communication with the application to keep the user apprised of all information available in the info file.

An example info file is shown below.

*********** Progress Update ***********
Percent of Database Processed    = 100%
Number of Packets Processed      = 578
Number of Fragments Processed    = 1642376
Number of Fragments Overlayed    = 1094381
Number of Hits                   = 909
Number of Warnings               = 0
Number of Errors                 = 0
Processed fragments/sec          = 896
Elapsed Time (sec)               = 1832
***************************************

If any errors are noted in the info file, it is strongly suggested that a user check in the log file to determine the nature of the problem!

Param File

BROOD’s command line interface can be efficiently run using the -param command line parameter followed by the name of a parameter file. Param files are files that contain one command line parameter on each line. Every execution of BROOD, including executions started within the GUI, generates a .param file called brood_1_parm.txt. This file contains all of the parameters used by BROOD. Further, this file can be used in subsequent runs with the -param flag either with or without user modifications.

The _parm.txt file is particularly useful if you want to use the graphical interface to set up a job, but execute the job on another machine (such as a cluster). The _parm.txt file can be moved to a different machine along with the -in -queryMol file it specifies, and it can be used to execute exactly the same run as would have been executed from within the graphical interface.

The following is an example of a _parm.txt file generated by the GUI (brood_1_parm.txt). It can be used to regenerate the run using the command line below.

This file could be edited or used as-is with subsequent runs. In addition, any explicit command line parameter takes precedence over parameters in the .param file, so this mechanism can be used to execute a series of similar jobs.

prompt> brood -param brood_1_parm.txt

prompt> brood -mpi_np 16 -param brood_1_parm.txt -prefix brood_mpi1

Listing of brood_parm.txt file contents:

#Interface settings

#Mode brood
#-failed (Not set, no default)
-in  /Users/Chemist/Desktop/brood.query.oeb.gz
#-log
#-molNames
-out  brood_hitlist.oeb
-prefix  brood
#-report
#-verbose  false

#Execute Options :
    -param  /Users/Chemist/Desktop/brood_parm.txt
    #-mpi_np (Not set, no default)
    #-mpi_hostfile (Not set, no default)

#Brood App Options :
    #-cpddb
    -db  /Users/Chemist/WorkingDirectory/chembl31
    #-quickLook  0
    -status  true

    #BROOD Hitlist Builder Options :
        -idea  true
        -maxHits  1000

        #Molecule Builder options :
            #-buildType  0
            -checkBond  true
            -deltaLocalStrain  false
            -maxLocalStrain  6.5
            -neutralPH  true
            -tautomers  false

    #BROOD Match Options :
        -attachCutoff  0.78
        -bondOrder  true
        -chargeType  5
        -checkGeometry  true
        -property  false
        -ringOnly  -2
        -shapeCutoff  0.6

        #Molecule Property Options :
            #-eganEgg  true
            #-veber  false

            #Maximum allowed extrinsic property values :
                #-maxAcceptor  11
                #-maxComplexity  1.25
                #-maxDonor  8
                #-maxFreq  100
                #-maxLipinski  1
                #-maxLogp  5.0
                #-maxMartin  1.0

            #Maximum allowed intrinsic property values :
                #-maxAromFJCt  5
                #-maxFsp3C  1.0
                #-maxHvyAtom  35
                #-maxMolWt  500.0
                #-maxRotBond  13
                #-maxTpsa  150

            #Minimum required extrinsic property values :
                #-minAcceptor  2
                #-minComplexity  0.0
                #-minDonor  1
                #-minFreq  1
                #-minLipinski  0
                #-minLogp  -1.0
                #-minMartin  0.2

            #Minimum required intrinsic property values :
                #-minAromFJCt  0
                #-minFsp3C  0.3
                #-minHvyAtom  7
                #-minMolWt  100.0
                #-minRotBond  0
                #-minTpsa  60

    #BROOD Score Options :
        -attachScale  1.5
        -bumpRadius  2.25
        #-ignoreProtein  false
        #-ignoreProteinSelect  false
        -rangeOffset  0
        -rangeSize  6
        -scoreType  0
prompt> brood -param brood_parm.txt

This would result in exactly the same execution as the one which generated the file above.

Log File

The log file contains all of the critical information about the execution in one place. It begins with a copy of the param file and it finishes with the final info file output. In between, it contains all the warnings and errors that might have occurred during execution. The log file gives a user a single place they can check to determine what job was run, if it executed properly, and how long it took.

Report Files

Report files contain a detailed listing of the similarity scores for every molecule in the database that was sufficiently similar to warrant being in the hitlist. By default, the file titled brood_1_hitlist_rpt.csv contains information about every molecule that remained in the hitlist after postprocessing, and brood_1_removed_rpt.csv contains information about every molecule that was removed from the hitlist during the postprocessing.

Report files contain columns for

  • database fragment SMILES

  • database fragment title

  • query SMILES

  • number of attachment bonds

  • structural rms value

  • attachment score

  • shape Tanimoto

  • color Tanimoto

  • combo score (shape + color)

  • et attachment score

  • et shape Tanimoto

  • electrostatic Tanimoto

  • et combo score (shape + et)

  • a comment regarding the disposition of the fragment (if and why it failed to be built)

If a particular score was not calculated for any given fragment, “-” will be found in the report file under the corresponding column. Files are in comma-separated value format and can easily be imported into a spreadsheet program for further analysis of the results.

Hitlist File

As discussed above, a hitlist is generated for each execution of BROOD. All of the hitlists are written in the file format specified by the output file name -out, which defaults to gzipped OEB format .oeb.gz, if the -out is not specifically used. For instance, if -out Output.oeb is specified, the hitlist will have OEB format instead. The first entry in the hitlist is the query molecule. Each subsequent analog molecule in the hitlist is oriented in the optimum overlay on the query molecule. For the -scoreType linkOnly hitlist, this overlay is the optimum overlay of the attachment point atoms. In addition, by default, the similarity scores and physical property data for each molecule are attached as SDTags (OEB and SD format only).

By default, the hitlist files are written periodically while the search is being carried out. This allows a user to examine results at intermediate stages without waiting for the entire search to complete.

If the -in flag specified a 2D input molecule, then all of the analog molecules in the hitlist will likewise be in 2D format, though the fragment similarity search will have been carried out in 3D. When using the graphical interface with a 2D input molecule, a 3D molecule is generated prior to processing and the output format will always be 3D.

Below we provide a description of some of the SDtags attached to the hitlist molecules:

AroRingCt: Number of aromatic rings in the molecule

ClusterID/IdeaGroup: ClusterID of the molecule

color: Color Tanimoto score of the replacement fragment against the query fragment

combo: Shape + color Tanimoto combo score of the replacement fragment against the query fragment

Egan: Boolean specifying whether the molecule passes the Egan bioavailability model

Fragment: SMILES string of the replacement fragment

freq: Frequency of the replacement fragment

fsp3C: Fraction of sp3 hybridized carbon atoms in the molecule

HvyAtoms: Number of heavy atoms in the molecule

LipinskiDon: Number of Lipinski donors in the molecule

LipinkskiAcc: Number of Lipinski acceptors in the molecule

LipinskiFail: Boolean specifying whether the molecule fails Lipinski’s Rule of Five

Local strain: Calculated local strain of the molecule

Delta Local strain: Calculated delta local strain of the hit with respect to the query molecule

Molecular TanimotoCombo: Shape + color Tanimoto combo score of the molecule against the query molecule

MolWt: Molecular weight of the molecule

p(active): Belief score of the molecule [Muchmore-2008]

RingCt: Number of ring atoms

RingRatio: Ratio of the number of ring atoms to the total number of heavy atoms

Rotors: Number of rotatable bonds in the molecule

shape: Shape Tanimoto score of the replacement fragment against the query fragment

Source Mols: SMILES strings of the molecules the replacement fragment is part of

Source Mol Labels: Labels of the molecules the replacement fragment is part of

tPSA: Calculated topological polar surface area of the molecule

Veber: Boolean specifying whether the molecule passes the Veber bioavailability model

XlogP: Calculated LogP of the molecule

Removed File

At the end of the primary fragment search, a complete hitlist is generated. In postprocessing, the fragments are built into whole molecules and tested for strain, the newly formed bonds are examined for chemical stability, potential duplicates in the hitlist are removed, and the new molecules are checked for bumps with the protein environment and against the optional selectivity protein. At any of these stages, fragments that are similar according to the primary search criteria may be removed from the hitlist for these secondary criteria. In rare cases, few fragments may remain in the final hitlist and the user may want to examine the removed fragments without needing to rerun the entire search. For this reason, any fragment removed at the postprocessing stage is written to the removed hitlist brood_1_removed.oeb.gz.