Example CommandsΒΆ
The example commands in this section can be run with files found under
the appropriate version directory in examples/rocs
under the top level
installation directory.
ROCS always requires at the very least a file containing the query
molecule(s) and a file containing the database molecule(s). The query
file follows the -query
command line flag and database
file follows the -dbase
flag. When ROCS is given no
other arguments besides a query file and a database file, it will
attempt to read the first query molecule, fit all database molecules
to the query molecule, and write out the top 500 structures that have
a Tanimoto Combo score above a given cutoff (default cutoff = -1.0).
It is important to note that a matching structure, or hi, is the
best fitting conformer of a database molecule. Only the best fitting
conformer of any molecule will be written out. Even if multiple
conformers of a molecule pass the cutoff, only the conformer which
fits the best will be written out by default.
ROCS writes a structure file and a report file for each query
molecule. The -prefix
command line switch is used to name
these files. The default prefix is rocs. The output structure file
is by default sdf
so that Shape Tanimoto and other
calculated values can be included as tagged data, but the format can
be changed by using the -oformat
flag or by giving a
specific filename using -hitsfile
.
Note that as of ROCS 2.4, the defaults include using a color force field (ImplicitMillsDean), optimization against chemistry (-optchem true) and ranking the hitlist via TanimotoCombo (-rankby TanimotoCombo).
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf
will cause structures in the file database.oeb.gz
that match
query molecule in 4cox.sdf
to be written to a file called
rocs_hits_1.sdf
. A tab-delimited report file containing the
scores will be written to rocs_1.rpt
. If
rocs_hits_1.sdf
is viewed in VIDA, hits can be visually
compared with the query and the numerical scores will appear in the
spreadsheet. Molecules in the hits file and the report file will be
ranked by their TanimotoCombo score.
To prevent continually over-writing output files, the
-prefix
flag allows you to give unique names to the
files.
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -prefix FOO
will write the hit structures into a file named
FOO_hits_1.sdf
and the overlay values will be in a file
called FOO_1.rpt
. As you follow the rest of the examples in
this section, you may wish to use different prefixes each time so that
you can compare how the output files differ.
The -cutoff
flag is used to control which database molecules are
considered hits. By default this is set at -1.0. The following demonstrates
changing the cutoff from the default value
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0
The difficulty in choosing a cutoff value is that the number of hits
at a given value is not usually known a priori, so setting too
high of a cutoff could result in no hits. The -besthits
and
-maxhits
flags can be used in conjunction with specifying a
cutoff value to coax ROCS into giving output of a manageable
size. Quick searches can be done to assess an appropriate cutoff values
for a particular query molecule. The following demonstrates a search
that will give a quick answer:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0 -maxhits 20
After 20 hits are found above a combo score of 1.0 in
database.oeb.gz
for the query molecule(s) in
4cox.sdf
the search terminates and the results are
written. This option prevents the entire database file from being
searched if a sufficient number of hits are found before the end of
the database file. Finding the best N hits above a threshold tends to
be a more common exercise. If the top N hits of a database up to a
maximum of 100 and above a value of 1.0 are desired, the following
search can be done:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0 -besthits 100
If you just want the best N hits regardless of the cutoff,
then using the default cutoff of -1.0 along with -besthits
generates the N best:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -besthits 100
If a report file alone is desired, the output of matching structures can be
suppressed with the -nostructs
option. For example:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0 -nostructs
will only generate a report file for matching structures but the matches will not be written to a structure file.
By default, ROCS uses an inertial frame alignment to generate 4 separate starting positions, optimizes all 4 overlays and selects the best match of the 4. By default, this inertial frame alignment aligns the centers-of-mass of the two structures being aligned. If either molecule is substantially smaller than the other, this may not be the best starting position, so the choice to use random starting positions is offered. The command:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -randomstarts 20
will use 20 random starting positions and keep the best score. Runtime is proportional to the number of starting positions, so using a large number for randomstarts can significantly slow down a ROCS job.
ROCS also calculates the Tversky coefficient based either on the
fit (database) molecule (FitTversky) or on the reference (query) molecule
(RefTversky). These scores will appear in the report file and in SD
tags if the structure are written to an SD or OEB file. ROCS can use these
other scores as the ranking score for the hitlist by using the
-rankby
switch.
To search a database and find the best 300 hits, scored by the FitTverskyCombo coefficient weighted to each database molecule:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -rankby FitTverskyCombo -besthits 300
A chemical force field is used by default (ImplicitMillsDean) but a
different one can be specified. Please refer to the chemical force
field (CFF) section for a description of how to define a chemical
force field. To simply calculate the CFF score after finding the best
alignment based on shape use the -chemff
option. For example:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -chemff ExplicitMillsDean
To turn off all color and run ROCS as shape overlap only, you can
use the -shapeonly
flag:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -shapeonly