Output Files¶
BROOD generates seven output files in addition to the
hitlist. All of these files begin with the prefix specified by the
-prefix
flag, that by default is “brood”. The also include an integer to retain unique file names in the working directory (in this example, we will use the integer “1”). These files include:
brood_1.info
, the info filebrood_1.log
, the log filebrood_1.param
, the param filebrood_1.rpt
, the report filebrood_1.csv
, the spreadsheet file
By default, there is one hitlist and the contents of the hitlist will be determined by the command flags.
brood_1.hitlist.oeb.gz
In post-processing of the search, some fragments are removed from the hitlist. These are preserved in the “removed” file.
brood_1.removed.oeb.gz
If the -ET
flag is used, the hitlist will include molecules
electrostatically similar to the query.
Before executing the search, the GUI also writes the encapsulated query into the query file.
brood_1_query.oeb.gz
Info File¶
By default, an information file titled brood_1.info
is written or
updated for every 500 fragments that are processed. Reading this info file
allows a user to monitor the progress of the database search while it is
occurring and it serves as a record of the performance of that search after the
execution is completed. During execution, the GUI uses this file as well as direct
communication with the application to keep the user apprised of all information
available in the info file.
An example info file is shown below.
*********** Progress Update ***********
Percent of Database Processed = 100%
Total packets read = 329
Packets suitable for Processing = 99
Number of Fragments Overlaid = 92134
Number of Fragments Eliminated = 74560
Number of Fragments Processed = 166694
Number in color hitlist = 300
Remove for quick search = 291
Remove protein clashes = 0
Remove select protein = 0
Remove duplicates = 0
Remove unstable bonds = 0
Remove strained molecules = 0
Number in final hitlist = 9
Number of Warnings = 2
Number of Errors = 0
Processed fragments/sec = 1742
Elapsed Time (sec) = 95
***************************************
If any warnings or errors are noted in the info file, it is strongly suggested that a user check in the log file to determine the nature of the problem!
Param File¶
BROOD’s command-line interface can be efficiently run using the
-param
command line parameter followed by the name of a parameter
file. Param files are files that contain one command-line parameter on each
line. Every execution of BROOD, including executions started within the GUI,
generates a .param file called brood.param
. This file contains all of
the parameters used by BROOD. Further, this file can be used in subsequent
runs with the -param
flag either with or without user modifications.
The .param
file is particularly useful if you want to use the graphical interface to set up a job, but execute the job on another machine (such as a cluster). The .param
file can be moved to a different machine along with the -queryMolecule
file it specifies and it can be used to execute exactly the same run as would have been executed from within the graphical interface.
The following is an example of a .param file generated by the GUI (brood_1.param
). It can be used to regenerate the run using the command-line below.
This file could be edited or used “as-is” with subsequent runs. In addition, any explicit command-line parameter takes precedent of parameters in the .param file, so this mechanism can be used to execute a series of similar jobs.
prompt> brood -param brood_1.param
prompt> brood -mpi_np 16 -param brood_1.param -prefix brood_mpi1
Listing of brood.param
file contents:
#Interface settings
#Execute Options :
-param /Users/chemist/data/brood_49.param
-input_chunksize 10
-success_chunksize 10
-failure_chunksize 10
#Brood :
#Input :
#-queryFrag (Not set, no default)
-queryMol /Users/chemist/data/brood_49_query.oeb.gz
-db /Users/chemist/data/db/chembl17
#-prot (Not set, no default)
#-select (Not set, no default)
#-noqueryprot false
#-cpddb none
#-param (Not set, no default)
#Output :
-prefix brood_49
-dots true
#-log (Not set, no default)
#-info (Not set, no default)
#-report (Not set, no default)
-format oeb.gz
-csv true
-idea true
-neutralpH true
-tautomer true
-hitlistProt true
#Control parameters :
-quickLook true
-ringOnly true
-ET false
-linkOnly false
-sdTag verbose
-checkBond true
#-maxHit 1000
#-title (Not set, no default)
-attachColor false
#-attachFrag true
#Advanced parameters :
#-bondOrder true
-attachmentCutoff 0.78
-shapeCutoff 0.6
-attachmentScale 1.5
-checkGeometry true
-fromCT false
-fileChrg false
-interval 5000
-hitinterval 1000
#-maxFrag 0
-rangeSize 6
-rangeOffset 0
-bumpRadius 2.25
#Property Selection :
-property false
#-maxMolWt 500.0
#-minMolWt 100.0
#-maxlogp 5.0
#-minlogp -1.0
#-maxpsa 150
#-minpsa 60
#-maxRotBond 13
#-minRotBond 0
#-maxHvyAtom 35
#-minHvyAtom 7
#-maxLipinskiDon 8
#-minLipinskiDon 1
#-maxLipinskiAcc 11
#-minLipinskiAcc 2
#Synthetic Properties :
#-maxComplexity 1.0
#-minComplexity 0.0
#-maxFreq 100
#-minFreq 1
#Derived Property Selection :
#-maxLipinski 1
#-minLipinski 0
#-maxMartin 1.0
#-minMartin 0.2
#-eganEgg true
#-veber false
#-maxFsp3C 1.0
#-minFsp3C 0.3
#-maxAromFJCt 5
#-minAromFJCt 0
prompt> brood -param brood.param
This would result in exactly the same execution as the one which generated the file above.
Log File¶
The log file contains all of the critical information about the execution in one place. It begins with a copy of the param file and it finishes with the final info file output. In between, it contains all the warnings and errors that might have occurred during execution. The log file gives a user a single place they can check to determine what job was run, if it executed properly and how long it took.
Report File¶
The report file contains a detailed listing of the similarity scores for every
molecule in the database that was sufficiently similar to warrant a 3D overlay.
By default, the file is titled brood_1.rpt
and contains 1 line for the
column titles and 1 line for each molecule in the database (tab separated
columns). The report file contains columns for
database fragment SMILES
database fragment title
query SMILES
number of attachment bonds
structural rms value
attachment score
shape Tanimoto
color Tanimoto
combo score (shape + color)
et attachment score
et shape Tanimoto
electrostatic Tanimoto
et combo score (shape + et)
a comment regarding the disposition of the fragment (if and why it failed to be scored)
If a particular score was not calculated for any given fragment, “-” will be found in the report file under the corresponding column. Since all the data for a fragment is contained in a single line, there is only one report file regardless of the number or type of the number or type of hitlists generated. This file is in tab separated format and can easily be imported into a spreadsheet program for further analysis of the results.
Hitlist File¶
As discussed above, a hitlist is generated for each execution of BROOD. All of
the hitlists are written in the file format specified by -format
,
which defaults to gzipped OEB format .oeb.gz
. If the
-queryMolecule
flag is specified, the hitlist will contain complete
molecules, otherwise it will contain fragments. The first entry in the hitlist
is the query molecule or query fragment. Each subsequent analog molecule
(fragment) in the hitlist is oriented in the optimum overlay on the query
molecule (fragment). For the “struc” hitlist, this overlay is the optimum
overlay of the attachment point atoms. In addition, by default, the similarity
scores and physical property data for each molecule (fragment) are attached as
SDTags (OEB and SD format only). In the case when the -queryMolecule
is not specified, but the -ET
flag is set, in the hitlist, the
attachment vectors of the fragments are replaced by methyl groups. This
facilitates easy calculation of electrostatic potentials in data visualization
programs such as VIDA.
By default, the hitlist files are written periodically while the search is
being carried out (see -hitinterval
). This allows a user to examine
results at intermediate stages without waiting for the entire search to
complete.
If the -queryMolecule
flag specified a 2D input molecule, then all of the analog molecules in the hitlist will likewise be in 2D format, though the fragment similarity search will have been carried out in 3D. When using the graphical interface with a 2D input molecule, a 3D molecule is generated prior to processing and the output format will always be 3D.
Below we provide a description of some of the SDtags attached to the hitlist molecules
AroRingCt: Number of aromatic rings in the molecule
ClusterID/IdeaGroup: ClusterID of the molecule
color: Color Tanimoto score of the replacement fragment against the query fragment
combo: Shape + color Tanimoto combo score of the replacement fragment against the query fragment
Egan: Boolean specifying whether the molecule passes the Egan bioavailability model
Fragment: SMILES string of the replacement fragment
freq: Frequency of the replacement fragment
fsp3C: Fraction of sp3 hybridized carbon atoms in the molecule
HvyAtoms: Number of heavy atoms in the molecule
LipinskiDon: Number of Lipinski donors in the molecule
LipinkskiAcc: Number of Lipinski acceptors in the molecule
LipinskiFail: Boolean specifying whether the molecule fails Lipinski’s rule of five
Local strain: Calculated local strain of the molecule
Molecular TanimotoCombo: Shape + color Tanimoto combo score of the molecule against the query molecule
MolWt: Molecular weight of the molecule
p(active): Belief score of the molecule [Muchmore-2008]
RingCt: Number of ring atoms
RingRatio: Ratio of the number of ring atoms to the total number of heavy atoms
Rotors: Number of rotatable bonds in the molecule
shape: Shape Tanimoto score of the replacement fragment against the query fragment
Source Mols: SMILES strings of the molecules the replacement fragment is part of
Source Mol Labels: Labels of the molecules the replacement fragment is part of
tPSA: Calculated topological polar surface area of the molecule
Veber: Boolean specifying whether the molecule passes the Veber bioavailability model
XlogP: Calculated LogP of the molecule
Removed File¶
At the end of the primary fragment search, a complete hitlist is generated. In
post-processing, the fragments are built into whole molecules and tested for
strain, the newly formed bonds are examined for chemical stability, potential
duplicates in the hitlist are removed, the new molecules are checked for bumps
with the protein environment and against the optional selectivity protein. At
any of these stages fragments that are similar according to the primary search
criteria may be removed from the hitlist for these secondary criteria. In rare
cases, few fragments may remain in the final hitlist and the user may want to
examine the removed fragments without needing to re-run the entire search. For
this reason, any fragment removed at the post-processing stage is written to
the removed hitlist brood_1.removed.oeb.gz
.