The example commands in this section can be run with files found under the appropriate version directory in examples/rocs under the top level installation directory.
ROCS always requires at the very least a file containing the query molecule(s) and a file containing the database molecule(s). The query file follows the -query command line flag and database file follows the -dbase flag. When ROCS is given no other arguments besides a query file and a database file, it will attempt to read the first query molecule, fit all database molecules to the query molecule, and write out the top 500 structures that have a Tanimoto Combo score above a given cutoff (default cutoff = -1.0). It is important to note that a matching structure, or hi, is the best fitting conformer of a database molecule. Only the best fitting conformer of any molecule will be written out. Even if multiple conformers of a molecule pass the cutoff, only the conformer which fits the best will be written out by default.
ROCS writes a structure file and a report file for each query molecule. The -prefix command line switch is used to name these files. The default prefix is rocs. The output structure file is by default sdf so that Shape Tanimoto and other calculated values can be included as tagged data, but the format can be changed by using the -oformat flag or by giving a specific filename using -hitsfile.
Note that as of ROCS 2.4, the defaults include using a color force field (ImplicitMillsDean), optimization against chemistry (-optchem true) and ranking the hitlist via TanimotoCombo (-rankby TanimotoCombo).
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf
will cause structures in the file database.oeb.gz that match query molecule in 4cox.sdf to be written to a file called rocs_hits_1.sdf. A tab-delimited report file containing the scores will be written to rocs_1.rpt. If rocs_hits_1.sdf is viewed in VIDA, hits can be visually compared with the query and the numerical scores will appear in the spreadsheet. Molecules in the hits file and the report file will be ranked by their TanimotoCombo score.
To prevent continually over-writing output files, the -prefix flag allows you to give unique names to the files.
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -prefix FOO
will write the hit structures into a file named FOO_hits_1.sdf and the overlay values will be in a file called FOO_1.rpt. As you follow the rest of the examples in this section, you may wish to use different prefixes each time so that you can compare how the output files differ.
The -cutoff flag is used to control which database molecules are considered hits. By default this is set at -1.0. The following demonstrates changing the cutoff from the default value
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0
The difficulty in choosing a cutoff value is that the number of hits at a given value is not usually known a priori, so setting too high of a cutoff could result in no hits. The -besthits and -maxhits flags can be used in conjunction with specifying a cutoff value to coax ROCS into giving output of a manageable size. Quick searches can be done to assess an appropriate cutoff values for a particular query molecule. The following demonstrates a search that will give a quick answer:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0 -maxhits 20
After 20 hits are found above a combo score of 1.0 in database.oeb.gz for the query molecule(s) in 4cox.sdf the search terminates and the results are written. This option prevents the entire database file from being searched if a sufficient number of hits are found before the end of the database file. Finding the best N hits above a threshold tends to be a more common exercise. If the top N hits of a database up to a maximum of 100 and above a value of 1.0 are desired, the following search can be done:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0 -besthits 100
If you just want the best N hits regardless of the cutoff, then using the default cutoff of -1.0 along with -besthits generates the N best:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -besthits 100
If a report file alone is desired, the output of matching structures can be suppressed with the -nostructs option. For example:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -cutoff 1.0 -nostructs
will only generate a report file for matching structures but the matches will not be written to a structure file.
By default, ROCS uses an inertial frame alignment to generate 4 separate starting positions, optimizes all 4 overlays and selects the best match of the 4. By default, this inertial frame alignment aligns the centers-of-mass of the two structures being aligned. If either molecule is substantially smaller than the other, this may not be the best starting position, so the choice to use random starting positions is offered. The command:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -randomstarts 20
will use 20 random starting positions and keep the best score. Runtime is proportional to the number of starting positions, so using a large number for randomstarts can significantly slow down a ROCS job.
ROCS also calculates the Tversky coefficient based either on the fit (database) molecule (FitTversky) or on the reference (query) molecule (RefTversky). These scores will appear in the report file and in SD tags if the structure are written to an SD or OEB file. ROCS can use these other scores as the ranking score for the hitlist by using the -rankby switch.
To search a database and find the best 300 hits, scored by the FitTverskyCombo coefficient weighted to each database molecule:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -rankby FitTverskyCombo -besthits 300
A chemical force field is used by default (ImplicitMillsDean) but a different one can be specified. Please refer to the chemical force field (CFF) section for a description of how to define a chemical force field. To simply calculate the CFF score after finding the best alignment based on shape use the -chemff option. For example:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -chemff ExplicitMillsDean
To turn off all color and run ROCS as shape overlap only, you can use the -shapeonly flag:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -shapeonly
To write out a file for input into EON, containing the top 1000 ROCS hits with 3 conformers per output molecule:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -eon_input
To write all ROCS hits to the EON input file:
prompt> rocs -dbase database.oeb.gz -query 4cox.sdf -eon_input -eon_input_size 0