Input Files¶
The Database File¶
The most common use of ROCS is overlaying a large collection of
molecules onto a query (reference) molecule. For the purposes of this
document, we’ll call this large file the dbase (fit) file. The most common
format for the dbase file is a multi-conformer OEBinary file created
by OpenEye’s OMEGA program, however, this file can be one of several
3D formats. These formats include SDF, MOL2 and PDB. ROCS
determines the input file format from the file extension, .sdf
or .mol
for SDF, .mol2
for MOL2,
.pdb
or .ent
for PDB. Gzip compressed files of these same
formats are allowed as well. ROCS will interpret infile.sdf.gz
as a gzip’ed SDF file.
Note
Note that even though all these formats are supported, using SDF or MOL2 can result in a loss of speed due to the huge I/O penalty of these formats.
ROCS has no provision for conversion of 1D/2D molecules to 3D. The input file must already be 3D. More importantly, ROCS will interpret conformers in the input file as part of a single multi-conformer molecule as long as they:
Are contiguous in the input file.
Have the same numbers of atoms and bonds in the same order
Have identical atom and bond properties with their order correspondent in the subsequent connection table
Have the same atom and bond stereochemistry
While this may appear to be a restrictive list, many programs write
multi-conformer molecules into SDF or MOL2 files such that the above rules
will be satisfied. If the conformers are named differently, (i.e. they
have a conformer number appended to the base name like acetsali_1,
acetsali_2), ROCS will still consider them part of a single multi-conformer
molecule if the criteria above are met. For file formats that are not
inherently multi-conformer, this behavior can be turned off with the
-scdbase
command-line switch. With the -scdbase
switch on, ROCS
will not attempt to combine multiple conformers into a single
multi-conformer molecule.
A new molecule file format, specifically for ROCS on large
clusters is the .rocsdb
format. See the MakeRocsDB
section for when to use this file and how to create it.
One other file type is allowed as the dbase file. A file name ending
in .list
or .lst
is assumed to be a list of actual
molecule files, one per line. ROCS will then open each in turn and
treat the entire collection as a single dbase file. Note that the
conformer detection/concatenation code above will not span the gaps
between these separate files.
Here is an example list file:
part1.oeb.gz
part2.oeb.gz
part3.oeb.gz
hits.mol2
The Query File¶
The second required input for a ROCS run is a file containing one or more molecules to be used as the query. ROCS will loop over molecules read in from the dbase file and attempt to overlay each of them against the query. In order to be consistent with other OpenEye software, this query molecule can also be referred to as the reference molecule.
Normally, ROCS treats each molecule in the query file as a single conformer molecule. For each molecule in the query file, ROCS will run a complete loop over the dbase molecules and write out a hits structure file and a report file, depending on the values of other command line switches described below.
Alternatively, ROCS can read queries as multi-conformer molecules by
adding the -mcquery
command line switch. In this mode, ROCS
uses the same rules as described in the The Database File section
to determine if two consecutive molecules are actually conformers of
the same molecule. For each multi-conformer molecule in the query
file, ROCS will loop over the dbase molecules’ conformers comparing
them to all query conformers. By default, ROCS will only return the
single best overlay of this NxM set of comparisons. More than one can
be returned by using the -maxconfs
command line switch.
Shape Queries¶
Version 3.0 of ROCS introduced a new type of query called a shape query.
It is a format that encompasses multiple elements of shape, including
molecules, color features and grids. It can be generated
from vROCS and saved in a shape query file
with the extension .sq
.
Grid Queries¶
ROCS can also use a grid instead of a molecule as a query ([Virtanen-2010]). These grids must be in GRASP, OpenEye, OpenEye ASCII Grid (.agd), CCP4, or XPLOR grid format and can be created with the OpenEye Grid toolkit or with a graphical application like GRASP. Certain ROCS features are not available when using a grid query. For example, the color force field features are not available with a grid query.