Input Files

The Database File

The most common use of ROCS is overlaying a large collection of molecules onto a query (reference) molecule. For the purposes of this document, we’ll call this large file the dbase (fit) file. The most common format for the dbase file is a multi-conformer OEBinary file created by OpenEye’s OMEGA program, however, this file can be one of several 3D formats. These formats include SDF, MOL2 and PDB. ROCS determines the input file format from the file extension, .sdf or .mol for SDF, .mol2 for MOL2, .pdb or .ent for PDB. Gzip compressed files of these same formats are allowed as well. ROCS will interpret infile.sdf.gz as a gzip’ed SDF file.

Note

Note that even though all these formats are supported, using SDF or MOL2 can result in a loss of speed due to the huge I/O penalty of these formats.

ROCS has no provision for conversion of 1D/2D molecules to 3D. The input file must already be 3D. More importantly, ROCS will interpret conformers in the input file as part of a single multi-conformer molecule as long as they:

  • Are contiguous in the input file.
  • Have the same numbers of atoms and bonds in the same order
  • Have identical atom and bond properties with their order correspondent in the subsequent connection table
  • Have the same atom and bond stereochemistry

While this may appear to be a restrictive list, many programs write multi-conformer molecules into SDF or MOL2 files such that the above rules will be satisfied. If the conformers are named differently, (i.e. they have a conformer number appended to the base name like acetsali_1, acetsali_2), ROCS will still consider them part of a single multi-conformer molecule if the criteria above are met. For file formats that are not inherently multi-conformer, this behavior can be turned off with the -scdbase command-line switch. With the -scdbase switch on, ROCS will not attempt to combine multiple conformers into a single multi-conformer molecule.

A new molecule file format, specifically for ROCS on large clusters is the .rocsdb format. See the MakeRocsDB section for when to use this file and how to create it.

One other file type is allowed as the dbase file. A file name ending in .list or .lst is assumed to be a list of actual molecule files, one per line. ROCS will then open each in turn and treat the entire collection as a single dbase file. Note that the conformer detection/concatenation code above will not span the gaps between these separate files.

Here is an example list file:

part1.oeb.gz
part2.oeb.gz
part3.oeb.gz
hits.mol2

The Query File

The second required input for a ROCS run is a file containing one or more molecules to be used as the query. ROCS will loop over molecules read in from the dbase file and attempt to overlay each of them against the query. In order to be consistent with other OpenEye software, this query molecule can also be referred to as the reference molecule.

Normally, ROCS treats each molecule in the query file as a single conformer molecule. For each molecule in the query file, ROCS will run a complete loop over the dbase molecules and write out a hits structure file and a report file, depending on the values of other command line switches described below.

Alternatively, ROCS can read queries as multi-conformer molecules by adding the -mcquery command line switch. In this mode, ROCS uses the same rules as described in the The Database File section to determine if two consecutive molecules are actually conformers of the same molecule. For each multi-conformer molecule in the query file, ROCS will loop over the dbase molecules’ conformers comparing them to all query conformers. By default, ROCS will only return the single best overlay of this NxM set of comparisons. More than one can be returned by using the -maxconfs command line switch.

Note

On Windows it is advisable to use the 64-bit ROCS executable during memory-intensive tasks (such as when -mcquery is combined with -subrocs) to avoid a crash due to insufficient memory.

Shape Queries

Version 3.0 of ROCS introduced a new type of query called a shape query. It is a format that encompasses multiple elements of shape, including molecules, color features and grids. It can be generated from vROCS and saved in a shape query file with the extension .sq.

Grid Queries

ROCS can also use a grid instead of a molecule as a query ([Virtanen-2010]). These grids must be in GRASP, OpenEye, OpenEye ASCII Grid (.agd), CCP4, or XPLOR grid format and can be created with the OpenEye Grid toolkit or with a graphical application like GRASP. Certain ROCS features are not available when using a grid query. For example, the color force field features are not available with a grid query.