• Docs »
• vROCS Usage

# vROCS Usage¶

## Overview¶

vROCS provides a single user interface from which the user can build/edit ROCS queries, set up ROCS runs and analyze/visualize the results. It also includes rigorous statistics tools for validating a query, facilitating the comparison of different queries and selection of the most appropriate query for the project.

There are four primary workflows (tasks) in vROCS available from an initial Welcome page with a button for accessing each task. Each workflow provides the tools required to guide the user through the task. These workflows are:

1. Perform a simple ROCS run
2. Create a query with a wizard
3. Create or edit a query manually
4. Perform a ROCS validation

vROCS Welcome page

## Setup a simple run and a validation run¶

vROCS guides the user through all the steps of setting up and performing a ROCS run and visualizing and analyzing the results. There are two main run types for which a user would wish to employ ROCS.

1. Simple run
2. Validation run

From the Welcome page click on the button to Perform a simple ROCS run or Perform a ROCS validation to bring up the Run set-up dialog.

 Simple run setup Validation run setup

Run Name: Editable name that will be used for the ROCS run and displaying the results in vROCS. A dropdown menu allows selection of the current color force field. Options are: Implicit Mills Dean (default unless changed in User Preferences), Explicit Mills Dean If a custom color force field was selected using Edit > Preferences then this will also be available here. The current force field cannot be changed for a specific active query. Changing the current force field in the dropdown will filter the active query list to show only queries which use that color force field. Opening a saved query file (*.sq, *.sq.gz) will use the color force field previously associated with that file and the active force field in the Color F.F. dropdown will change to reflect this. List of queries that can be selected for the ROCS run. Click on Open queries... or the folder icon to browse to saved ROCS query files (molecules/grids/queries). Click on the black down arrow icon to select from a list of recently used queries. The source file path is shown below the opened query name. Queries built in the vROCS query editor are automatically added to this list for the current vROCS session. The query highlighted in blue is the selected (active) query. The query name can be edited (pencil) or the query can be deleted from the list (red X). Corresponds to the -query command line flag. The source of ligands which ROCS is to align to the query file during a simple ROCS run. Click on Open database... or the folder icon to browse to database files. Click on the black down arrow icon to select from a list of recently used databases. Corresponds to the -dbase command line flag. The source of ‘active’ ligands which ROCS is to align to the query file during a ROCS validation run. Click on Open database... or the folder icon to browse to database files. Click on the black down arrow icon to select from a list of recently used databases. The source of ‘decoy’ ligands which ROCS is to align to the query file during a ROCS validation run. Click on Open database... or the folder icon to browse to database files. Click on the black down arrow icon to select from a list of recently used databases. Return to the Welcome screen or the vROCS query editor. Proceed to the Run set-up details Inputs tab. This button only becomes active once the query and database (or actives and decoys) files are selected. Run ROCS using the selected run name, query and database (or actives and decoys) and default parameters. Only becomes active once query and database (or actives and decoys) are selected.

The input for ROCS is a shape-based query with optional color atoms and one (or more) databases of molecules to search. The query shape is most frequently derived from a ligand of interest although other sources are possible, such as a variety of grids built in AFITT, Spicoli, OEDocking, OEChem and third party tools (see ROCS Shape Query Sources). vROCS allows the user to load a pre-saved query for ROCS, having 3D coordinates, or to build or modify one in situ. The list of available queries for a specific run is filtered based upon the type of color force field shown in the Color F.F. drop down. The database(s) are required to be prepared externally with 3D coordinates generated and conformers enumerated, usually by OMEGA. (See Simple run setup and Validation run setup)

### Simple Run¶

A simple run aligns a database of pre-computed molecular conformers against a query. For each molecule in the database it overlays every conformer based on molecular shape with the option to employ color force fields. For a full description of the shape and Gaussian theory employed by ROCS, see Shape Theory. The conformers are scored based upon the Gaussian overlap to the query and the best scoring conformer is reported. The most common scores are ShapeTanimoto (shape alone) or, default in ROCS and vROCS, TanimotoCombo (shape + color). The molecules in the database are finally ranked by the scores for their best aligned conformers. This type of simple ROCS run is commonly used when lead-hopping i.e. looking for structurally dissimilar molecules which have a higher probability of biological activity at the same target as the query while also overcoming issues such as ADME/Tox or patent coverage. Numerous literature examples of this application exist and some representative examples are given here.

### Validation Run¶

Before running a simple ROCS run on a large database, e.g. corporate database of thousands or potentially millions of compounds (and even more conformers!), one should have confidence that the query is indeed able to distinguish true actives from inactives. For this purpose the validation ROCS run is employed. The major difference when setting up a validation run is that the validation run searches two sets of compounds, whereas the simple run searches only a single database. These two datasets are:

1. A set of molecules known to possess the desired biological activity. These are the actives.
2. A set of molecules known (or presumed) not to possess the desired biological activity. These are the decoys. The decoys can be a random set of molecules from a database or could be property matched (e.g. DUD [Huang-2006]) for a more stringent validation.

The method of alignment of compounds is the same for both run types (simple and validation). In the case of the validation run the desired result is that molecules from the set of actives are generally scored more highly than the set of decoys i.e. they have a greater shape similarity. Measurement of the degree of selectivity between these two datasets provides the user with confidence that the query is, indeed, selective and suitable for use in a simple ROCS run on a larger dataset.

A good validation experiment is vital to the success of future research. It needs to be carefully planned and set up e.g. selection of active and decoy datasets as well as query design (see Editing ROCS Queries in vROCS). For example, is a modification to a query really beneficial to the selectivity of that query? The rigorous use of validated statistical methods and parameters to analyze and, more importantly, compare runs is essential and frequently overlooked. For that reason statistical analysis tools are included in vROCS when visualizing the results of a ROCS validation run. These are described below in Statistics Metrics.

The run set-up options pages (See Simple run options and Validation run options) in vROCS are pre-populated with the default ROCS options e.g. how compounds are initially oriented and aligned and how alignments are scored and ranked. These default values are calculated to give a good starting point in the majority of examples. However, these are also some of the most common options that a ROCS user might want to modify. For example, changing the start type from inertial to random can be particularly useful for a grid-based query (as opposed to a shape-based query) because it is more difficult to identify and set the 4 true inertial points for a grid. The disadvantage of using random starts and setting this number to be high is that it will significantly increase the run time. Checking off the 3D view option will speed up runs, particularly on computers with limited compute resources. By default an Open GL 3D alignment for each compound is shown as the run progresses and, since this can be somewhat CPU intensive, switching the display off can be beneficial.

 Simple run options Validation run options

Working Directory:

Directory in which the files for the ROCS run are to be saved. Default location is the vROCS installation directory, if it is user writable, otherwise, a temporary directory is used. Click on the folder icon to browse and select alternative directories. Corresponds to the -outputdir command line flag.

Best Hits:

Number of top ranking hits to be saved after searching the entire database. Use the arrows to increase/decrease or type the desired number in the field. Corresponds to the -besthits command line flag. Only available for simple ROCS run.

Prefix:

Naming prefix for the current ROCS run. All the output ROCS files will contain this name. If no name is specified the default is “rocs”. Corresponds to the -prefix command line flag. The output files using the prefix are:

Parameter file (prefix.parm), Log file (prefix.log), Report file (prefix_1.rpt), Status file (prefix_1.status), Structure file (prefix_hits_1.oeb.gz)

Rank by:

Dropdown allows selection of one of 12 score types available in vROCS. The results will be ranked by the selected score for selection of Best N hits (above). Default is TanimotoCombo. Corresponds to the -rankby command line flag. Available scores are:

TanimotoCombo, ShapeTanimoto, ColorTanimoto, Scaled Color, Combo Score, Combo Reference Tversky, Shape Reference Tversky, Color Reference Tversky, Combo Fit Tversky, Shape Fit Tversky, Color Fit Tversky, Overlap

Score Cutoff:

Check the check box to exclude any hit with a score less than the specified value from the hitlist. The score used is the one specified by the Rank by field. Change the cutoff value by using the arrows to increase/decrease or type the desired number in the field. Allowed score range varies according to score selected in the Rank by field. Corresponds to the -cutoff command line flag. Only available for simple ROCS run.

Tanimoto Cutoff:

Check the check box to exclude any hit with a ShapeTanimoto score less than the specified value from the hitlist. Change the cutoff value by using the arrows to increase/decrease or type the desired number in the field. Allowed score range is 0-1 (min-max). Corresponds to the -tanimoto_cutoff command line flag. Only available for simple ROCS run.

Shape Only:

Check the check box to perform a shape only overlay, turning off the color force field. Corresponds to the -shape_only command line flag.

Score Only:

Check the check box to score the incoming poses against the query in their current 3D coordinate frame, turning off alignment and hitlist. This is useful for scoring a pre-aligned dataset. Corresponds to the -score_only command line flag. Only available for simple ROCS run.

Start Type:

Use the radio buttons to specify how ROCS places the initial alignment. Inertial is the default option and uses 4 initial starts. Random specifies using random starts for the initial overlay and corresponds to the -randomstarts command line flag. Specify the number of random starting configurations by using the arrows to increase/decrease or type the desired number in the field.

Color Optimize:

Check the check box to use the color force field in the optimization of the alignments. Default is checked on. Corresponds to the -optchem command line flag.

Full Optimization:

Check the check box to perform full best overlay optimization. Default is checked on. If off (false) then score only. Corresponds to the -opt command line flag.

3D View:

Check the check box to select whether a 3D view of the query and database molecules aligning is displayed as the run progresses. Default is on. If checked off a text-based progress screen is displayed. This will increase ROCS’ run speed on low powered computers.

The final page of set-up is the Run Summary on the Run Rocs page. The summary gives a quick rundown of the query file and database used, as well as the ROCS version. It also contains a collapsible panel to display the full set of command line options that will be fed to ROCS and will be saved as the ROCS parameter file (.parm). This can be useful when setting up and validating runs in vROCS that will later be run command line across a remote cluster. The Additional Options prompt allows entry of a command not listed in the command line such as a new parameter not yet available in the released version of ROCS.

 Simple run summary Validation run summary
Query: Database: Query file as specified on the Inputs tab Database file as specified on the Inputs tab in a simple run Actives database file as specified on the Inputs tab in a validation run Decoys database file as specified on the Inputs tab in a validation run Working directory where all files will be written. Naming prefix for the output files that will be written to the Working Directory, as specified in the Options tab. This is defined by the Prefix field on the Options tab. Click to display/hide the full command line that will be sent to the ROCS executable. This can be copied to export and use in command line ROCS installations. A field is available for typing additional ROCS parameters that will be included in the command line not exposed via the vROCS interface. Note that the command line may use temporary files in some instances.

## Results visualization and analysis¶

The vROCS interface provides multiple tools for results visualization and analysis. The 3D visualization window shows the query where the molecule structure is displayed as green sticks with associated shape and color atoms. All three portions (molecule, shape and color) can be made visible or hidden using controls in the window. The aligned hit molecules are shown as sticks colored by atom type. Buttons at the bottom of the 3D window allow the shape grid, shape atoms, color atoms and color atom labels to be toggled on or off. The color of the shape contour can be changed and the contour level displayed for the shape grid can also be modified using a slider. This is particularly useful when adding color atoms to a grid-based query, for example.

3D visualization window

Icon Description
Fit scene to screen
Fit query to screen
Take screenshot of 3D Window (excludes the gray query information panel)
Show/hide the 3D parameters control window
Edit query: Open the Edit Query panel and add the query editing icons (See Editing ROCS Queries in vROCS). This icon is replaced by a Done Editing icon while in editing mode.
Change color of the contour
Toggle display of the shape contour on/off
Toggle display of shape atoms on/off
Toggle display of color atoms on/off
Toggle display of color atom labels on/off
Slider to adjust display level of the shape contour from 0-3 (default 1). This only changes the contour display and not the query itself

The 3D parameters control window provides user control for graphics rendering of the image in the 3D window.

The font size for text labels in the 3D display can be altered, as can the stereo visualization type and settings. Not all stereo settings are available on all machines and therefore some stereo options may be grayed out. See the 3D parameters table below for details.

3D Parameters control window

Text Scale: Slider to adjust the size of the font for the color atom labels. Disable stereo graphics. Display the image in the 3D window in splitscreen stereo mode for unassisted 3D viewing. Display the image in the 3D window in a format suitable for viewing with a Zalman Trimon LCD 3D monitor (or similar hardware). Enabled only on machines which are capable of performing 3D hardware stereo-in-a-window. Hardware stereo requires a graphics card that supports “stereo in a window” display as well as the appropriate stereo glasses. Slider to adjust the angle between the images for splitscreen, stencil or hardware stereo modes. Slider to adjust the separation between the images for splitscreen, stencil or hardware stereo modes.

A results spreadsheet below the main 3D window lists results for each molecule (with best-fitting conformer number) and its associated scores. The data is displayed for the run associated with the highlighted Run Name tab. Individual or multiple molecules can be observed overlaid with the query in the 3D window. Only the top 20 scoring molecules are displayed in the spreadsheet, based on the Rank by score selected in the Run Set-up Options tab. The spreadsheet can be resorted by clicking on other column headers, and the top (or bottom) results for that column will be displayed. Note: this can be a DIFFERENT set of 20 molecules than were displayed originally. To see ALL results users are encouraged to use the spreadsheet tools in VIDA. This can be done by right-clicking on the Run Name tab in the results panel and following the option to “Open ‘Run Name’ in VIDA”.

Icon Description
Display/hide the results panel.
Show the ROCS output. This is the information that would be displayed in the terminal window during a command line ROCS run.
Show the statistics panel. Only available for ROCS validation run.
Make this compound visible in the 3D window and keep it visible while scrolling through other results.
Delete the results for the highlighted Run Name tab

The spreadsheet columns include the name of the database compound, the name of the query, 14 different scores (see section Report File for full definitions) and a rank column (based on the score type chosen when setting up the run). The available scores are:

1. TanimotoCombo
2. ShapeTanimoto
3. ColorTanimoto
4. Ref Tversky
5. RefColorTversky
6. RefTverskyCombo
7. FitTversky
8. FitColorTversky
9. FitTverskyCombo
10. ScaledColor
11. ComboScore
12. ColorScore - score type from older ROCS versions not available as a Rank by... choice but can be used to sort the spreadsheet.
13. SubTan - score type from older ROCS versions not available as a Rank by... choice but can be used to sort the spreadsheet.
14. Overlap

1. Active - indicates whether the compound was in the set of actives (1) or decoys (0)
2. Rocs_db_index - identifies the placement of each compound in the database ROCS formed by combining the active and decoy sets prior to search. This is required in case a compound in the actives and decoys happens to have the same name.
3. Lingos similarity - the 2D fingerprint similarity to the query, if the query is a molecule

The spreadsheet can be sorted by any field. If an alternative score is chosen for sorting then the best 20 molecules by that score will be displayed. This may be a different set of molecules from the original 20 displayed because vROCS sorts and retrieves data from the saved structure hitlist and report files. Additionally, the spreadsheet includes controls to show/hide or mark each molecule. This allows the user to compare overlays between compounds and against the query in the 3D visualization window as well as control the data that is saved out.

The most common scores used are ShapeTanimoto (shape only) or the default score, TanimotoCombo (shape + color). Tanimoto scores should be used when the query and database molecules are a similar size. Tversky scores include a weighting factor to deal with size differences and are therefore useful when the query is small and the database molecules are large, or vice versa. The RefTversky score is weighted for a small query e.g. to find all instances of a known active scaffold fragment in a database. The FitTversky score has the opposite weighting.

Additionally the validation runs have a statistics panel available. It provides several statistical metrics for analysis of the quality of the results. The metrics reported in vROCS consist of the following and are described below:

1. ROC (receiver operating characteristic) curve together with its AUC (area under the curve) ± 95% confidence limits
2. Score histogram to examine the distribution of scores obtained for the active and decoy datasets
3. Early enrichment at 0.5%, 1% and 2% of decoys retrieved ± 95% confidence limits
4. When comparing multiple runs p-values are calculated for each enrichment level & AUC

These metrics and the rationale behind their inclusion are fully described in the section Statistics Metrics.

Statistics Panel

Compare to: Add the results from another run to the statistics spreadsheet and calculate the p-values between the two runs. The additional run will also be plotted in the ROC curve and score histogram. The dropdown lists None, Lingos and all other validation runs available from that vROCS session. Lingos is the 2D similarity and is always available as a comparison choice with molecular queries. Default is None. Chose score, plot or spreadsheet data to save as .csv format for the active run. If another run(s) is selected in the Compare to field that data will also be saved. Select Plot data to save an image file of the ROC plot or score histogram. Select from a dropdown whether to display the ROC curve or the score histogram Select one of the scores (metrics) to be used for the ROC plot or score histogram. These correspond to the score columns in the results spreadsheet

The statistics panel includes a spreadsheet listing the values for the statistics metrics, together with score histograms and an ROC plot from which is calculated the AUC (See section Statistics metrics. The ROC plot graphs actives vs decoys and a higher AUC represents greater selectivity in favor of the actives. The ROC curve can be plotted for any of the 14 scores available (See ROC plot). Note that changes to the score used for the ROC plot will probably cause changes to the AUC and enrichment values.

ROC Curve

Instead of the ROC plot a score histogram can be plotted. The score histogram compares the distribution of scores for the actives and the decoys (See Score Histogram). The better the AUC (closer to 1.0), the greater the separation will be, in general, between the two histograms, with the actives scoring higher and further to the right than the decoys.

Score Histogram

To better visualize the plots the plot area can be resized by dragging the divider between the plot and spreadsheet. The statistics panel can also be resized by moving the divider between the panel and the 3D window.

Multiple runs can be compared in the spreadsheet. The statistics panel enables the comparison of multiple validation runs using the Compare to dropdown and the data and plots can be exported to a CSV (.csv) file for import into other applications or statistics packages. The statistics for multiple runs will be displayed side by side in a spreadsheet and these runs will be plotted together on the ROC plot and score histogram for a direct comparison. This could help to answer the following questions:

• Is one query more selective than another on the same database?
• Is the query selectivity the same for multiple training databases? Was a representative validation database selected?

When comparing two runs it is useful to gauge whether one is giving statistically better results than another. For this reason p-values are displayed in the comparison (see Statistics for comparison of ROCS runs). A low p-value suggests that the base run is statistically better than the run selected in the Compare to dropdown. For a description of p-values see section Statistics Metrics. If comparing particularly large data sets it is wise to pay attention the memory foot print – save and close any unneeded runs.

Statistics for comparison of ROCS runs

While it is more open to individual interpretation, inspection of the overlays in the 3D window should not be overlooked as a valuable tool for results interpretation. Can additional knowledge of the receptor be applied that validates the ROCS alignments?

## Statistics metrics¶

To facilitate accurate understanding, interpretation and comparison of virtual screening results from multiple (independent) experiments when publishing or presenting research it is important to use consistent and industry standard metrics. To date (May 2011) no official industry standard has been set. However, steps and recommendations were made in this direction at the “Evaluation of Computational Methods” symposium at the 234th American Chemical Society in August 2007 and the follow-up Journal of Computer Aided Molecular Design issue 22 in March 2008. Measures that have become standard in other fields tend to possess the following short list of characteristics:

1. Independence to extensive variables
2. Robustness
3. Straightforward assessment of error bounds
4. No free parameters
5. Easily understood and interpretable

The widespread and habitual use of good reporting practice is something that OpenEye is keen to encourage and therefore vROCS implements statistics metrics discussed in these recommendations ([Jain-2008], [Nicholls-2008]).

The metrics reported in vROCS consist of the following and are described below:

1. ROC (receiver operating characteristic) curve together with its AUC (area under the curve) ± 95% confidence limits
2. Early enrichment at 0.5%, 1% and 2% of decoys retrieved ± 95% confidence limits
3. When comparing multiple runs p-values are calculated for each enrichment level

### ROC Curve¶

A ROC curve ([ROC]) in vROCS plots % (or fraction) of actives found on the Y-axis vs % decoys on the X-axis as the scores decrease. The top scoring compounds are plotted closest to the origin. It gives an indication of how the actives and inactives are ranked as a result of the ROCS run. An ideal ROC plot for a perfectly selective query would show all of the actives being identified first because they score most highly. The plot would shoot up the Y axis at X=0. Then the lower scoring decoys would be plotted and the curve would follow the X axis at Y=100 %. ROC Plot 2 illustrates an ROC plot for an almost perfectly selective query where most of the actives rank more highly than most of the decoy molecules.

ROC Plot 2

An almost perfectly selective ROC curve with AUC = 0.979 where most of the actives rank more highly than most of the decoy molecules. The dashed diagonal line represents random.

### AUC¶

The AUC (area under the curve of an ROC plot) is simply the probability that a randomly chosen active has a higher score than a randomly chosen inactive. A useless query, one with no better chance of identifying an active from an inactive, would give an AUC of exactly 0.5, as shown by the dotted line in ROC Plot 2. A perfect query is one which ranks all the actives above all the inactives. In this case the AUC would be 1.0. In most cases the observed AUC will be somewhere between these two extremes, and for a highly selective query it will often be in the 0.8-1.0 range. Sometimes an AUC of < 0.5 is observed. This occurs when the query is scoring the decoys more highly that the actives i.e. it is selective for the inactives.

Note

AUC has, for a long time, been a standard metric for other fields. The main complaint against the AUC is that is does not directly answer the questions some want posed, i.e. the performance of a method in the top few percent. It is a global measure and therefore it reflects the performance throughout a ranked list. Thus, the notion of “early enrichment” may not be well characterized by just AUC, particularly when virtual screening methods yield AUC values short of the 0.8-1.0 range. For this reason we include early enrichment values in the ROCS output for a validation run. Early enrichment, while certainly more reflective of the common usage of virtual screening methods, is a property of the experiment conducted, not the methods being studied in that experiment and thus should be used with care.

AUC is quoted in vROCS as a mean value ± 95% confidence limits. Bootstrapping the data produces a set of samples from which the mean and confidence levels are obtained.

### Enrichment¶

Consider the example in Early Enrichment Comparison below taken from the Nicholls paper ([Nicholls-2008]) which illustrates how AUC provides no information on early enrichment. Both the Early (pink) and Late (blue) curves have an AUC of exactly 0.5. Clearly both examples are equally likely to score an active higher than an inactive (or vice versa) overall. However, the solid (pink) plot also shows that some fraction of the actives is scoring significantly higher than the inactives, while another fraction of the actives scores worse. In a virtual screen it is desirable not to screen the entire database but to select only the top scoring fraction of the compounds. Only the average behavior across the whole database, not the early enrichment of actives in the solid pink plot, is reflected in the AUC. Thus, it is beneficial to report early enrichment in addition to AUC.

Early Enrichment Comparison

Use of early enrichment values overcomes this deficit in AUC. vROCS reports enrichment percentages at the following values: 0.5%, 1% and 2%. The formulation of enrichment that is used in vROCS reports the ratio of true positive rates (the Y axis in an ROC plot) to the false positive rates of 0.5%, 1% and 2% (found on the X axis in an ROC plot). Thus “enrichment at 1%” is the fraction of actives seen along with the top 1% of known decoys (multiplied by 100). This removes the dependence on the ratio of actives and inactives and directly quantifies early enrichment. It also makes standard statistical analysis of error bars much simpler.

Enrichment values are quoted as a mean value ± 95% confidence limits. Bootstrapping the data produces a set of samples from which the mean and confidence levels are obtained. Repeating a run within a single ROCS session will always result in identical enrichments. However, enrichments may vary slightly between ROCS sessions because a new random number is supplied to the bootstrapping algorithm for each ROCS session.

### p-Value¶

In statistical hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. The fact that p-values are based on this assumption is crucial to their correct interpretation ([Wikipedia-pValue], [Dallal-2001]).

In the vROCS analysis there are two runs being compared, a Base run (A) and a ‘Compare to’ run (B). These two runs use two different queries or methods (e.g. color force fields) to search the same active and decoy databases. We have a statistic, AUC (or % enrichment), one for each distribution (A & B). We would like to know whether AUC-A is statistically better than AUC-B otherwise we cannot say anything about the comparison of the methods. AUC-A and AUC-B alone are not enough to generate anything of statistical significance. To circumvent this we use a bootstrapping method which randomly selects a statistical sampling of the input molecules to repeatedly generate many AUCs.

Traditionally, the null hypothesis is that while the perceived results may be different (e.g. between AUC or % enrichment), the underlying processes are indistinguishable. However, since null-hypothesis testing predicts the likelihood of obtaining a given result if the null hypothesis is true, use of this null hypothesis wouldn’t give any indication of whether method A or method B is better. To avoid this confusion OpenEye has used a modified null hypothesis. The null hypothesis, as implemented in vROCS, is that making a change to the query/method results in a better result (AUC or % enrichment) for run B than run A. Therefore we utilize a one-sided statistical test, not the usual two-sided test, based on the prior assumption that method B is superior to method A. The p-value is the probability that AUC-B > AUC-A and that this difference is due to differences between the methods/queries and not due to random chance alone.

If the null hypothesis holds true then we observe that AUC-B > AUC-A and the p-value tends towards 1.0. If the null hypothesis is incorrect then the p value tends towards 0.0 and the query/method used in run A (Base run) is statistically better than that used in run B (the ‘Compare to’ run). If the results for the two runs are indistinguishable and the result could be due to random chance then the p-value = 0.5.

The plot below illustrates these three p-value extremes. Each curve in the plot represents an example of comparing two ROCS runs. For each run bootstrapping produced a statistical sampling of the data from which the mean and ±95% confidence limit values were calculated for AUC and % enrichments. The distribution of differences in AUC (or % enrichment) between the bootstrapped samples for the two runs can also be calculated and is plotted below.

Plot to illustrate calculation of p-values

In the case of p-value = 0.5 half of the distribution is positive and half of the distribution is negative. The p-value is calculated from the integral of the area under the curve from 0 to infinity (the part of the curve that falls within the shaded area). In the case of p-value = 1.0 the difference between run B and run A is always positive and the entire curve is above 0 on the X-axis. The entire curve falls within the shaded area and so the integral is 1.0. In the case where the p-value = 0.0 the difference between run B and run A is always negative. None of the curve falls within the shaded area and so the integral is 0.0

When considering the results from two ROCS runs the p-values should be interpreted as follows. If the p-value tends towards 0.0 then the results for the Base run are better than the ‘Compare to...’ run (run A > run B). If the p-value = 0.5 then the results for the two runs are statistically indistinguishable. If the p-value tends towards 1.0 then the Base run is not better than the ‘Compare to...’ run or, in other words, the ‘Compare to...’ run is giving results better than the Base run (run B > run A).

Consider the example below for three different trypsin queries run against the same active and decoy databases. From the ROC plot for the three trypsin queries, we observe that run Trypsin1 has an AUC intermediate between those of Trypsin2 and Trypsin3.

ROC plot for three trypsin queries

Looking at Table 1, where Trypsin1 is the Base run (run A) and is compared to Trypsin2 and Trypsin3 (run B), we see that there is a p-value of 0.979 for Trypsin2. The Trypsin2 AUC of 0.940 mean value with 95% confidence limits of 0.888 and 0.979 is has very little overlap with Trypsin1 at 0.868 with 95% confidence limits of 0.805 and 0.915. The p-value = 0.979 suggests that Trypsin2 is producing superior results and these are due to differences between the queries, not to chance alone. The null hypothesis (run B > run A) holds true in this case. Note that in Table 2, where Trypsin2 is now the Base run, the p-value is reversed. In this case p-value = 0.021 suggests that Trypsin1 is producing inferior results and these are due to differences between the queries, not to chance alone. The null hypothesis (that the ‘Compare to’ run produces superior results to the Base run) can be rejected. Similarly, when comparing Trypsin3 to Trypsin1 in Table 3<trypsin_table_3_fig, the p-value of 0.006 suggests that Trypsin3 is producing inferior results and these are due to differences between the queries, not to chance alone. This is supported by our observations in the ROC plot (see ROC plot for three trypsin queries) where Trypsin3 clearly has the lowest AUC.

Table 1: Trypsin1 (Base) compared to Trypsin2 and Trypsin3

Table 2: Trypsin2 (Base) compared to Trypsin1 and Trypsin3

Table 3: Trypsin3 (Base) compared to Trypsin1 and Trypsin2

Now consider the p-values for the 0.5%, 1% and 2% enrichments. Trypsin1 and Trypsin3 have similar enrichment levels. For example, at 0.5% enrichment Trypsin1 is 35.987 with 95% confidence levels of 11.321 and 62.857 while Trypsin3 is 28.625 with 95% confidence levels of 9.524 and 51.724. Each has an average enrichment that is well within the 95% confidence limits of the other. This is supported by p-values tending towards 0.5 i.e. 0.340 when comparing Trypsin3 to Trypsin1 (in Table 1) and 0.660 when comparing Trypsin1 to Trypsin3 (in Table 2) (the inverse around 0.5). From this we can conclude that the query Trypsin1 gives a slightly better 0.5% enrichment than does Trypsin3 (p-value 0.660, from Table 1) but that the differences may not be entirely statistically significant.

In the case of Trypsin2 the enrichments at all levels are much higher than Trypsin1 or Trypsin3 (e.g. 126.127 with 95% confidence limits of 96.000 and 152.381 for the 0.5% enrichment) with 95% confidence limits that do not overlap at all with those for Trypsin1 or Trypsin3. This results in p-values of 1.000 when either Trypsin1 or Typsin3 is the Base run (Tables 1 or 3) (i.e. Trypsin2 is the superior query and the null hypothesis holds true) or 0.000 when Trypsin2 is the Base run (in Table 2) (i.e. Trypsin1 and Trypsin3 are clearly inferior to Trypsin2 and the null hypothesis is rejected). These conclusions are also clearly visible in the ROC plot.

Repeating a run within a single ROCS session will always result in identical p-values for enrichments. However, since enrichments may vary slightly between ROCS sessions, when a new random number is supplied to the bootstrapping algorithm, there may be small differences in p-value for the same combination of runs if repeated in different ROCS sessions.

A typical cutoff for statistical significance of p-values is applied at the 5% (or 0.05) level. Thus, a p-value of 0.05 corresponds to a 5% chance of obtaining a result that extreme, given that the null hypothesis holds. A p-value of less than 0.05 (or greater than 0.95) would give good confidence that the selectivity you observe in your ROC plot is derived exclusively from differences between the two queries or methods and not a result of chance alone.

## Saving ROCS data¶

The vROCS interface provides multiple tools for saving data. Data that can be saved includes:

1. The query file
2. The entire set of results obtained from a simple or validation ROCS run
3. Data and statistics from a validation run in .csv delimited file format
4. Screenshots of the 3D window illustrating the query and/or aligned hit molecules
5. Screenshots of the ROC or Score Histogram plots

Query file: A query that is built or modified in vROCS can be saved for future use. The default file type for saving a query is a ROCS Saved Query file with extension .sq or .sq.gz. This file type is not compatible with older versions of ROCS. Additionally, it contains information about the color force field used and therefore cannot be used with an alternative color force field.

There are multiple ways to save a query file from the vROCS interface.

• Firstly, using the File menu click on File > Save Query... or use the Ctrl+S shortcut keys. This is a ‘Save As...’ action and will always prompt for a filename and a directory in which to save it. This action is performed on the query currently selected in the Query list of the Run Set-up Inputs dialog.
• There is also a right-click option. When viewing the results panel right-clicking on the run name tab opens a right-click menu in which the second option is Save Query from ‘Run Name’. This is a ‘Save As...’ action and operates specifically on the query associated with that run. Therefore, it allows the user to save an older version of a query that may have been subsequently modified by going back to activate the Results tab for the earlier run.

Option Description
Save results from ‘Run name’ Save the results for the active results set in a 3D structure and data file. The default file type is OE Binary (*.oeb, *.oeb.gz).
Save query from ‘Run name’ Save the query for the active results set in a 3D shape query file (*.sq, *.sq.gz). The file contains information about both shape and color, as well as the color force field used to apply the color atoms. This is a Save As... action and will always prompt for a filename.
Rename ‘Run name’ Rename the Results Name tab
Open ‘Run name’ in VIDA Exports the structures and data for the active Run Name tab into VIDA. If VIDA is not already open a new session is opened. If VIDA is already in use the dataset is appended to the list of molecules already in the VIDA List Window. All the data is available to view in the VIDA spreadsheet.

Results: From the results spreadsheet for either the simple or validation run the user can right-click on the run name tab to open a right-click menu in which the first option is Save Results from ‘Run Name’. This is a Save As... action and operates specifically on the data for the run associated with that spreadsheet. Having multiple spreadsheets available allows the user to save results from either the current run or an older run by selecting the appropriate run name tab. All the data points for the run are saved, not just the top 20 results visible in the spreadsheet. The sort order from the spreadsheet is not retained. The compounds in the saved file are sorted by the Rank By score selected during runs set-up. The results are saved in a variety of possible molecule file types suitable for opening in the VIDA spreadsheet or other third party applications. The default is the OpenEye OE Binary file type with .oeb or .oeb.gz file extension. Since only the top 20 results are visible in the vROCS spreadsheet users are encouraged to save the results and use the VIDA spreadsheet, not the vROCS interface as the primary tool for analyzing results.

Statistics data: From the statistics panel in a validation run three different types of data - score data, plot data and spreadsheet - can be exported and saved in a comma delimited format (.csv), suitable for loading in text-based applications and other statistics packages. These options are selected from the Choose stats to save... drop down menu in the statistics panel.

Choose stats to save drop-down in the statistics panel

Option Description
Score Data Save a file containing the raw scores for the top scoring aligned conformer of each compound searched using each of the scoring functions of a validation run displayed. Data for all database compounds are exported.
Plot Data Export the data points from either the ROC Curve or the Score Histogram. These are (x,y) datapoints from the curve or histogram created in vROCS, not raw data. Note that this export is context sensitive upon the plot that is currently displayed (i.e. when the ROC curve is displayed the plot data output will have (x,y) values to recreate the ROC curve and when the Score Histogram is displayed the data output will have (x,y) data for both actives and decoys to recreate a histogram).
Spreadsheet Save the data displayed in the vROCS statistics panel spreadsheet (AUC and enrichment values with error bars). If a second run has been selected to compare against the currently active run then data for both runs and the associated p-values are exported in a single file.

ROC plot/Score Histogram: Exporting the data to recreate the ROC plot (or Score Histogram) to a .csv delimited file (described above) provides the opportunity to rebuild the plots in a third party graphing application and combine plots from different ROCS sessions. However, this can prove somewhat cumbersome and it is frequently useful to take a screenshot of the current plot (either of a single run or a comparison of multiple runs) for inclusion in a report, presentation or publication. To do this, right click on the plot and choose “Save Image...”.

Save an image file of the ROC plot or score histogram

3D window screenshot: A screenshot can be useful for insertion into presentations and publications. A camera icon at the top of the 3D window allows for taking a single click screenshot of the view in the 3D window. It is a WYSIWYG (what you see is what you get) screenshot of the 3D window with the exception that the surrounding buttons are not included. See the figure below.

3D Window screenshot option

## ROCS shape query sources¶

ROCS is most commonly used to compare alignments of molecular shapes. However, a range of other shapes, e.g. molecular grids, form equally valid and useful alignment target queries, with the following provisos.

Grids are built without color atoms. The absence of color atoms in a query usually causes ROCS performance to be lower. For ligand shape queries adding color atoms has been shown to enhance ROCS performance with twice as much signal over random when color atoms are used, compared to shape alone. Without color the ROCS TanimotoCombo scores will also generally be lower (TanimotoCombo 0-1 instead of 0-2). Therefore, one should either add color atoms manually to a grid-based query (see section Editing ROCS queries in vROCS) or compare with the ShapeTanimoto score obtained from a ligand-shape query.

Note

Using DUD 1.0 with ROCS shape only the average AUC across the 38 cases is approximately 0.6. With shape + color the average AUC is around 0.73. Therefore, the delta over random for shape is 0.1 and for shape + color is 0.23. Hence, by this rather odd way of looking at it there is twice as much signal. - P. Hawkins, OpenEye

It is possible to add color points to grid shapes using the editing tools available in vROCS, described in the tutorial Building and editing a query manually and this can usefully guide the alignments. However, ligand shape with color generally provides superior results to using a grid-based query. Grids can be useful in cases where no suitable ligand query exists.

There are several potential sources of grids:

• AFITT can produce a grid of electron density from crystallographic data. It is also possible to back-compute a grid of density for a crystallographic or docked ligand. This allows heavier atoms to contribute more to the grid than light ones, whereas shape grids are uniform.
• Spicoli will make grids from surfaces.
• OEDocking also produces a shape grid.
• Using the OEGrid toolkit you can read in any grid format and, using an ASCII interchange format, write it to an OE format that could be used by vROCS. This capability allows access to grids produced by third party applications. For example, DOCK ([DOCK]) uses scoring grids, GRID ([GRID]) makes grids and so on. All of these could be used to makes queries for vROCS, but their application and usefulness has not been thoroughly validated.

Recent research has been carried out at OpenEye to validate some tools currently under development ([Nicholls-2010]). These produce shapes that describe a protein binding pocket (using the same technology as Spicoli) for use as ROCS queries. Initial results show that shapes from sources other than pure ligands can be successfully used as useful ROCS queries and that adding color atoms is often useful to increase selectivity (and is never detrimental, to date), just as for ligand-based shape and color queries.

## Editing ROCS queries in vROCS¶

In earlier versions of ROCS it was difficult to edit a query using the various command line utilities. The input to ROCS was generally required to be either a whole molecule query or a grid or shape query, although it was possible to load one or more molecules into a 3D builder and then modify or merge them into a super molecule. Having the vROCS graphical editor for ROCS provides the ability to move from a simple molecule with automatic color atom assignment or a grid (with no color atoms at all) to a position where the user can decide how the query is built. The vROCS graphical editor will facilitate this process by:

1. Reducing the time required.
2. Reducing the risk of errors.
3. Increasing the flexibility of the editing process by allowing a greater range of editing tasks to be accomplished.

This will have a knock-on effect that more complex and/or selective queries can be employed in ROCS and, in some cases, it is possible that higher quality results could be obtained. Additionally, it will facilitate the use of queries that are not directly molecule shape-based e.g. multiple fragments or grid-based queries.

There is a danger associated with the ability to edit the query and that is over-editing i.e. editing a query until it does not work. For this reason the validation run and its associated statistical analysis tools were included in vROCS (see section Statistics metrics). By providing the validation the user has the tools necessary to decide whether a complex new query is really better than simply using e.g. the x-ray ligand. This caveat should constantly be uppermost in the user’s mind.

There are two methods for editing queries in vROCS. An automated wizard guides the user through one of a few predesigned paths for building a new query. Manual query building and editing is also available. Both functions are available from the Welcome interface.

### Query Building Wizard¶

The query building wizard is designed to walk the user through building a query through one of the paths below:

1. SMILES
2. Ligand Model Builder

These are typically paths for which manual query building is less straightforward.

Query building wizard interface

There are two ways to access the Wizard. Either select the Create a Query With a Wizard button on the vROCS Welcome page or select File > New Query... from the menu at any time during a session.

#### SMILES¶

The SMILES option produces up to 5 queries from an input SMILES string, calculating a reasonable 3D structure and conformations.

In the Create Query tab of the Query Wizard select the radio button SMILES and then click Next (as seen above).

The Select SMILES tab becomes active. This gives the option to type in a SMILES string or molecule name (systematic IUPAC or molecule name, i.e. aspirin). The molecule structure will be incrementally displayed as the SMILES string is entered.

Select SMILES page

Clicking on the green “+” icon in the SMILES entry field pops up a Sketcher in which the molecule can be sketched or the SMILES string or molecule name can be entered in the input field. When sketching is complete, clicking OK will close the Sketcher and update the structure displayed in the Select SMILES tab.

Picto sketcher

Alternatively, a file of one or more molecules can be loaded and a SMILES string is displayed for each molecule in the file. Scrolling through the list of molecules will change the structure displayed. The highlighted structure in the list is the one that will be selected for the next step.

Because the query will be run through OMEGA to generate conformers, and OMEGA requires stereochemistry to be defined, chiral molecules with undefined stereo centers will require an additional step to specify the proper configuration at each stereo center. Atoms and bonds with undefined stereochemistry will appear highlighted in red (see figure below). Clicking on the highlighted atoms will cycle through the possible configurations.

Molecule with Undefined Stereochemistry

Molecule with Stereochemistry Defined

Clicking Next activates the Pick Queries page. For the previously highlighted structure five (5) OMEGA lowest energy conformers are generated and listed. Conformer 1 is the lowest energy conformer. Scrolling through the list of conformer names using the up/down arrow keys or clicking on a specific conformer displays that structure in the 3D window above where it can be rotated or zoomed using the mouse. Multiple conformers can be chosen for import as ROCS queries into the main vROCS interface. The desired conformers are marked with a red check mark by double-clicking on their entries in the list.

Pick queries page

Click Finish to close the Wizard and import the selected conformers into vROCS. They will be listed in the Query list for simple or validation runs and the lowest energy conformer will be displayed in the 3D window.

#### Ligand Model Builder¶

If there are several known active ligands for a given project it can be desirable in ROCS to use a hypothesis query which is an alignment of more than one of the active ligands. This avoids losing potentially important information from one ligand that may not be present in others. The ligand model builder takes a set of pre-aligned ligands in the same Cartesian coordinate frame and carries out a rigid alignment in the same frame of reference. It produces hypothesis models for 1,2...n molecules (where n is selected by the user as the maximum number of molecules per model). The top scoring model(s), based on TanimotoCombo score, are returned. These are the model(s) that best represent the set of ligands as a whole. Since this is a rigid alignment no OMEGA conformers are generated and it is therefore important to use a ‘reasonable’ structure for each compound that represents a putative binding mode e.g. a set of docked ligands or a set of x-ray crystal structures.

Consider the following example. A set of 19 trypsin protein crystal structures are sequence aligned and the ligands extracted to give a set of 19 Cartesian aligned ligands. The user is interested in building 2 models, each containing up to 3 molecules. The ligand model builder builds hypothesis alignment models containing 1, 2 and 3 of these 19 ligands. It scores all the models against the set of 19 compounds and returns the top scoring 2 models. These may contain 1, 2 or 3 of the ligands and a 3-ligand model does not necessarily contain any of the ligands used in a 1- or 2-ligand model.

From the Create Query dialog of the Wizard select the radio-button option for Ligand Model Builder and click Next. This will activate the Load Aligned Ligands page.

The user is required to browse and select a file containing the aligned ligands (single conformer only). The input file type can be any format with 3D coordinates but must contain molecules. Other potential ROCS inputs (e.g. shape grids) cannot be supported in this workflow. Both 3D and 2D preview windows allow scrolling through the individual members of the file. The 3D window is interactive for zooming or rotating with the mouse. Click the Next button to proceed to the Adjust Parameters page.

All the required parameters are set by default so it is optional to make any additions or changes to fields in the Adjust Parameters page.

Option Description
Max. molecules per model Models containing 1,2,...n molecules will be considered for an input value of n.
Models to keep The number of best models to output, based on TanimotoCombo score.
Output title prefix Optional naming prefix for the output models.
Merge color atoms Optional merging of close color atoms of the same type in multi-molecule models.

The following options are available:

Max. molecules per model: This is the value n described above. For n=4 models containing 1, 2, 3 & 4 molecules will be considered. The input value for n cannot exceed the number of ligands in the input file. As n is increased the number of models considered will increase at a rate of the sum of the binomial coefficients. It is the sum of the binomial coefficient elements of the nth row of Pascal’s triangle from 1 to n ([Pascal-2009]).

$$k = \sum_{i=1}^n\frac{h!}{(h-n)!n!}$$

k = total number of models considered;

h = no. of molecules in the pool;

n = max. molecules per model

Consider the trypsin example above where k=19. The binomial coefficients (for 1 to 19) are:

$$19+171+969+3,876+11,628+27,132+50,388+75,582+92,378+$$
$$92,378+75,582+50,388+27,132+11,628+3,876+969+171+19+1$$

If only models containing 1 molecule are considered then 19 models will need to be evaluated. If models containing up to 3 molecules are considered then 1159 models will be built and evaluated:

$$19+171+969=1159$$

This represents 19 one-molecule models, 171 two-molecule models and 969 three-molecule models. However, if models containing all 19 molecules were to be considered then:

$$k = 2^{19} -1=524,287$$

k = total number of models considered

Thus, 524,287 models would be built and evaluated. Clearly this can become a cpu intensive and time consuming process so it is recommended to keep n low (<5) when the pool of molecules is large. This also avoids building overly complex models.

Models to keep: This is the number of top ranking models to output. By default this is set to 1 and therefore a single model will be produced. The output model is the one with the highest mean TanimotoCombo score across all the ligands. If this input value is set higher then more models will be output for visual evaluation and use as possible hypotheses. Increasing the number of models to keep has no effect on the run time; the same number of models are created and evaluated. It only changes the number of models retained in the output set.

Output title prefix: This optional field names the models produced. For a prefix ‘model’ the resulting models will be named ‘model 1’, ‘model 2’, etc. If no prefix is provided then the model name will be formulated by the names of the ligands which comprise the model. For example, ‘1GJ6_1QBO’ is a model made from two ligands named ‘1GJ6’ and ‘1QBO’ in the input file.

Merge color atoms: This is an optional field. If two color atoms are overlaid then they will automatically be merged. However, color atoms are often close but not perfectly overlaid, for example, the two donor atoms highlighted below. Checking the Merge color atoms box will attempt to produce a single color atom which describes both. This simplifies the resulting model. Color atoms can also be manually deleted or merged later, if desired, as described in Manual query building.

A two-ligand model produced by the model query builder

When the desired parameters are set in the Adjust Parameters dialog click Next to begin the calculation. A progress tab will provide information on the progress of the model building. Models are built using the color force field currently selected in the user preferences (Edit > Preferences > vROCS, see vROCS Preferences).

When the calculation and evaluation is complete the models best fitting the parameters, based upon TanimotoCombo score, will be displayed in the Pick Queries tab. A 3D preview window shows the model which can be rotated and zoomed. The model currently on display is highlighted in the list below the 3D display. The list gives each model’s name and a description made up of the names of the ligand(s) taken from the input ligand pool to build the model.

Clicking the Back button from the Pick Queries tab and changing any of the parameters will prompt with a warning that the previous results will be lost. To avoid losing potential queries it is advisable to first import any models of interest to the main vROCS interface before re-running the Wizard.

To import one or more models as queries to the main vROCS interface click to add a check mark in the column next to the model description. At least one check mark is required to activate the Finish button. The checked models will be imported into the Query list for a simple or validation ROCS run and each can be highlighted and viewed in the 3D window.

Queries built using the Ligand Model Builder can include many color features, some derived from each of the ligands used to build the query. This will have a couple of ramifications.

• ROCS search speeds are typically slower with more color atoms
• Unmatched color atoms give rise to scoring penalties and so scores using color might be lower than one might expect (e.g. TanimotoCombo score <1) compared to using a single ligand as a query. However, AUC and enrichment values will not be affected.

For these reasons it can frequently be useful to further edit the queries produced by the Ligand Model Builder and simplify them by removing some of the color features. Example candidates might be those that were not quite close enough to be merged by the Merge color atoms algorithm.

### Manual Query Building¶

Manual query editing and building operations can be carried out on a new query (e.g. an imported ligand), a saved query file or the output from the Ligand Model Builder. To carry out the editing operations vROCS is required to be in the Edit Query mode.

Edit Query Mode: The Edit Query mode can be accessed in one of the following ways. Either click on the Create or Edit a Query Manually button in the Welcome screen or, at any time during an vROCS session, click on the Edit Query icon at the top of the 3D window. While in the editing mode the Edit Query button will be replaced by the Done Editing button and an editing toolbar will appear at the left of the 3D Window to select atoms, add color atoms, delete atoms or color atoms and merge color atoms.

Editing a query

Icon Description
Edit Query icon. Displayed in the 3D Visualization Window. Open the Edit Query panel and add the query editing icons. This icon is replaced by a Done Editing icon while in editing mode.
Done Editing icon. This returns the user to the 3D visualization window and hides the editing icons once query editing is complete. This icon is only visible in editing mode.
Selection mode. Click on a shape or color atom to select it. Selected atoms will be highlighted in orange. CTRL click to select multiple shape atoms or color atoms. Only all color atoms or all shape atoms can be selected using CTRL click. Right-click and drag a box to select a portion of the query including both shape and color atoms.
Add Color Atom mode. Click and select the desired color atom type from the pop-out menu. Choices for built-in color force fields are: 1) Acceptor (A), 2) Anion (An), 3) Cation (C), 4) Donor (D), 5) Hydrophobe (H) and 6) Rings (R). The letter on the icon indicates the currently active atom type. For other force field types, the list will be populated accordingly.
Delete Atoms mode. Click on shape or color atoms to delete them from the query. Right-click and drag a box to delete all shape and color atoms currently visible within the box.
Delete Selected Atoms action. Delete all currently selected atoms and/or color atoms. If there are no selected atoms, this button is disabled.
Merge Color Atoms action. Merge multiple molecular fragments into a single query and merges color atoms of the same type to a single average representation. If color atoms are selected, those will be merged. If no color atoms are selected, all color atoms of the same type within 0.75 Angstroms of each other will be merged.

An Edit Query panel will display on the left hand side of the screen. This panel contains two areas. At the bottom is a Shape Inventory area. This lists all the open shape files that could be used in the query. These can be opened ligand or grid files, hits from an earlier ROCS run or other active queries. Items can be displayed in the 3D window by clicking their name. Molecules will be displayed as atom-colored sticks. At this stage they have no associated ROCS shape or color elements. Multiple items can be displayed together for comparison by clicking on the green visibility icon to the right of the item name on or off. Hovering over an item’s name will display a 2D depiction of the structure, if it is a molecule file.

Edit Query panel

At the top of the Edit Query panel is the Current Query area where components (e.g. atom components, color components, shape components) of the current working query are listed and can be selected or deselected. A selected item has a red check mark next to its name and will be used in the current query. Queries derived from molecules are displayed in the 3D Window as green colored sticks with atom-type colored heteroatoms. Color atoms are shown as colored spheres with labels. Grid shapes are shown as a gray, transparent surface (see Editing a query).

Any item can be moved from the Shape Inventory to the Current Query by dragging it from the bottom panel to the top panel or by right-clicking its name and selecting Add to Query. The other right-click options available in the Shape Inventory are to Delete the current item from the list or to Rename the item. A query can be made up of multiple query elements. To remove any element from the working query click on the red check mark to undisplay it or right-click on its name and select the option to Disable in Query. Other right-click options for the Current Query elements are Delete and Rename.

At any point the current working query can be saved by clicking on the Accept button below the Shape Inventory or by File > Save Query... These are both save as... actions and will always prompt for a filename and directory for the file to be saved. Note that the color force field parameters used to apply color atoms to a query are saved with the query and cannot be changed on re-opening a saved query. The color force field is set in the Edit > Preferences dialog (see vROCS Preferences) and can only be changed before any molecules or shapes are loaded into vROCS. When editing is complete either click on the Done Editing icon at the top of the 3D Window or the Use in ROCS button below the Shape Inventory. Both will return to the display from which the editing mode was accessed (e.g. Welcome screen or Run set-up dialog).

The manual query editing tasks available via the vROCS interface are:

1. Merge two or more molecules/grids.
2. Delete color atom(s).
3. Delete shape atom(s).
4. Add color atom(s) from one or more selected atom(s).
5. Alter shape atom or color atom weighting.
6. Merge color atom(s).
8. Add color atom(s) to grids.

Each is described in more detail below.

Merge two or more molecules/grids: A query can be made up of molecules and/or grids from multiple sources/files. For example, two molecular fragments that describe ligand-protein interactions at different positions in the binding pocket, as shown below.

Query built from two fragments

Open the files into the Shape Inventory using File > Open in the Edit Query mode and then drag them to the Current Query area of the Edit Query Panel. Each of these molecules/grids will be added to the current working query. They should have a similar 3D coordinate frame so all portions of the query can be viewed in the 3D window at the same time. Saving the current query will save all constituent parts of the query together unless the red check mark indicating Use in Current Query is checked off. The combined query can be further edited using the options below.

Delete color atom(s): A ligand-based query automatically has color atoms assigned by vROCS. They are assigned using the currently selected color force field, as applied in the checkcff (command line) utility. The default color force field for vROCS is ImplicitMillsDean (see section:ref:Color Force Field<colorforcefield>) but this can be changed using the Edit > Preferences dialog (see vROCS Preferences for more details). Color atoms can also be manually placed on atoms or grids. It can be desirable to delete a color atom. For example, a hydroxyl oxygen atom would be assigned as both an H-bond donor and an H-bond acceptor by vROCS as highlighted in Deleting a color atom. However, knowledge of your active compounds and/or receptor cavity may lead you to believe that an acceptor is required at that position. The donor color atom can be deleted.

Deleting a color atom

Click on the eraser icon highlighted in Deleting a color atom, above, to activate the Delete Atoms mode. A gray background to the button shows it has been selected. A single left click on the color atom (or atoms) that you wish to delete will remove that feature from the query. In the case of the combined donor/acceptor example clicking on the blue quadrants of the color atom will remove the donor feature and clicking on the red quadrants will remove the acceptor. Multiple color atoms can be deleted using sequential clicks.

At any point the Edit > Undo menu item can be used to replace a color atom (or shape atom) that is accidentally deleted. Since the Delete Atoms mode operates on both color and shape atoms it is often useful to hide (undisplay) the shape atoms and surface contour using the buttons at the bottom of the Edit Query 3D window.

To delete multiple adjacent color atoms right click and drag to draw a rectangular box around the features to be deleted with the Delete Atoms button highlighted. In this case it is desirable to display only the color atoms (hide the shape atoms and contour). This prevents other parts of the query from being deleted. The delete function only operates on the part of the query that is visible.

An alternative method is to click on the lightning bolt icon to activate the Selection mode and select the color atom(s) to be deleted. The grid contour may have to be hidden before the color atom(s) can be selected. With the Selection mode active (icon highlighted gray) select the color atom by clicking on it. Once selected the color atom is highlighted in orange. To select multiple color atoms CTRL-click on each or right-click and draw a box around the group. Either click on the Delete Selected Atoms button (red ‘X’) or right-click on the highlighted color atom and select the Delete option. Using the Selection mode is useful when multiple color atoms are to be deleted.

Delete shape atom(s): It can be useful to delete part of the query molecule (shape atoms) if, for example, you are starting from a large query molecule but are carrying out vHTS to identify small ligands that are a good shape fit to only part of that query. Alternatively, some parts of the known active molecule may be important for binding and another portion requires less stringent alignment so a query built only from those fragments would be useful.

Click on the Delete Atoms (eraser) button to make it the active mode. A gray background to the button shows it has been selected. A single left click on the shape atom (or atoms) that you wish to delete will remove that feature from the query. Multiple shape atoms can be deleted using sequential clicks. The portion of the shape contour associated with the deleted shape atom(s) will also be deleted. To delete multiple adjacent shape atoms right click and drag to draw a rectangular box around the features to be deleted with the Delete Atoms mode button highlighted.

The Delete Atoms mode operates on both color and shape atoms and will delete a color atom preferentially over a shape atom. The delete function only operates on the part of the query that is visible. Therefore it is possible to delete shape atoms but leave behind color atoms with no associated shape in the query (as seen below) if the color atoms are hidden during the delete operation. The utility of queries of this nature has not been evaluated at OpenEye. It is possible that if a conformer is able to align with the color atom outside the shape during a ROCS alignment it may score higher than another conformer with an equally good alignment to the shape part of the query. In that case the conformer would receive a higher ColorTanimoto score although the ShapeTanimoto would be unchanged. However, this is probably only the case for color atoms close to the shape and these color atoms will not help to drive the optimization of the shape-based alignment. Therefore, for most queries the user should manually delete the color atoms as well as the shape atoms.

Color atoms remain after the shape atoms (and contour) have been deleted

An alternative is to select the shape atom to be deleted using the Selection tool (lightning bolt icon). The grid contour and color atoms may have to be hidden before the shape atom can be selected. Once selected, the shape atom is highlighted in orange. Either click on the Delete Selected Atoms button (red ‘X’) or right-click on the highlighted color atom and select the Delete option.

Add color atom(s) to atoms: Color atoms can be added to shape atoms in a query molecule using the Add Color Atoms tool. This button contains a pop-out menu for selection of any color atom type for the current color force field. In the two built-in color force fields, these are acceptor (A), anion (An), cation (C), donor (D), hydrophobe (H) or rings (R). Click on any color atom type to create atoms for that color type. The button change will change the letter on the icon, as indicated in the list above, to indicate the active color atom type. In the Edit Query mode position the color atom by clicking on any atom of the ligand. Since color atoms can also be added to grids and surfaces the surface contour should be hidden before adding the color atoms. This is achieved using the Toggle Surface Contour button at the bottom of the 3D window.

If a color atom is added in error then the File > Undo option will remove it again. Alternatively, follow the instructions to Delete Color Atoms above.

An example of a situation where this might be useful is if the query ligand is a basic amine but you believe the N-atom is protonated at physiological pH to better interact with the protein. Deleting the Donor color atom from the N followed by adding a Cation color atom would effect this change. The overall shape of the query would be unchanged.

An alternative method is to select the shape atom where the color atom is to be added using the Selection mode. The grid contour may have to be hidden before the shape atom can be selected. Once selected the shape atom is highlighted in orange. Right-click on the highlighted atom. A pop-up menu provides the option to Create color atom... with a drop-down for selection of color atom type.

Alter shape atom or color atom weighting: Some molecular features or color interactions between a ligand and a protein are more important than others. This knowledge can be incorporated into the query to help drive the alignment and rank the hits. One way to weight only the important interactions is to delete those considered less important. However, this can cause loss of valuable information from the query. More preferred would be to increase the weighting of the important shape or color atoms while retaining the other features in the query.

Color atom weighting is achieved in a similar fashion to adding color atoms, described above. In the image below, the highlighted acceptor has been given a double weighting (acceptor x2) by adding a second acceptor feature to the hydroxyl. Select the Add Acceptor tool and click on the atom/color atom to be weighted. Similarly, use the Delete tool (eraser icon) to remove additional weighting from color atoms, exactly as previously described for deleting color atoms. The added color atoms will be listed as ‘Shape from “User Added Features”’ in the Current Query panel and can therefore also be deleted, hidden or selected for use in the current query (red check mark) in that panel. There is no limit to the increase in weighting that can be employed.

Weighting a color atom

Shape atoms can be weighted by selecting the desired shape atom with the Selection tool so that it is highlighted in orange. Right-click on the highlighted atom and from the pop-up menu select the option to Set shape atom strength... The strength can be weighted from 1 (normal) to 5. This has the effect of placing up to 5 of those atoms at that position in space. The atom will be displayed larger to indicate its weighting and the shape contour will also be expanded, as seen for the carbonyl oxygen below.

 Methotrexate query with all shape atom weightings set to 1 (default) Methotrexate query with the carbonyl shape atom weighting set to 5

Merge color atoms: There are two cases where it may be useful to merge multiple color atoms together into a single color atom. The first occurs when a query is composed from multiple aligned molecules, resulting in color atoms of the same type lying very close to one another in space. The second case occurs when a molecule gives rise to multiple color atoms where a single color atom might better represent what the user would like to match. For example, a carboxylate group will by default be represented by two acceptor color atoms, but in come cases a single acceptor color atom, located midway between the two oxygen atoms, might be preferable. Likewise, a bicyclic ring system could be represented by a single ring color atom instead of the default two. Color atoms of the same type may be merged into a single color atom, located at the geometric centroid of the original color atoms, using the Merge Color Atoms tool.

Merging color atoms

Aligned ligands from 1C2D.pdb and 1G3D.pdb in a single query. The highlighted area illustrates where color atom merging is useful. The arrow indicates the Merge Color Atoms button

In the Merging color atoms figure, the ligands from 1C2D.pdb and 1G3D.pdb above both possess a terminal benzamidine group, located at nearly identical positions.

Selecting the Merge Color Atoms tool when no color atoms are selected, will merge similar color atoms within 0.75 Angstroms of each other. (See the figure Merged ligands and color atoms.) To merge specific color atoms together, simply select two or more color atoms of the same type, and then use the Merge Color Atoms tool.

Note that if the current query is composed from multiple molecules, it will be collapsed into a single super molecule when color atoms are merged. This means that each individual ligand can no longer be separately selected, hidden, or deleted from the current query.

Merged ligands and color atoms

Aligned ligands from 1C2D.pdb and 1G3D.pdb in a single query (“super molecule”). The highlighted area illustrates where color atom merging simplified the query.

Load grid: The majority of ROCS queries tend to be ligand-based. However, queries from other sources can also be useful, particularly if no active ligand is known. This is discussed in ROCS Shape Query Sources. Grids are loaded into the vROCS interface in the same way as ligands, the major difference being that no color features are automatically added to a grid. In the Edit Query mode Use File > Open and browse to the desired grid file. The grid will be loaded into the Shape Inventory area of the Build Query panel from where it can be dragged up to the Current Query.

Add color atom(s) to grids: When a grid is first loaded into vROCS it has no associated color atoms because the color atoms can only be automatically assigned to a ligand. However, adding color atoms to grid-based queries can enhance ROCS search selectivity, just as for ligand-based queries.

Color atoms can be added to grid shapes in the same manner as adding color atoms to ligands. Select the desired Add Color Atom tool (acceptor, donor, etc) and click on the grid contour surface to place a color atom at that position on the contour surface. Clicking subsequent times on the color atom will increase the weighting for that color atom. Color atoms that one might expect to place on/near a grid surface are those that would make strong protein-ligand interactions, e.g. H-bonding.

Grid surface color features

Hydrophobic and ring color atom types would normally be buried within the shape contour. These can be placed by initially clicking on the grid surface (see Grid surface color features above) and then CTRL-click elsewhere on the surface to move the color feature to the mid-point between the two surface points (see below). Additional surface points can be used to position the color atom at the desired position.

 Click on the grid surface to place an initial hydrophobe feature, indicated by the yellow sphere CTRL-click (red arrow) to bury the feature mid-way between the two surface points (indicated by orange dots)

An alternative method of placing color atoms within the grid is to use the Contour level slider. The default contour level is set at 1. Increasing the contour level (up to a maximum of 3) has the effect of displaying a smaller surface shape. Place the color feature on the new contour surface (Hydrophobe feature placed on contour surface...) using the Add color atoms tool described above and then use the slider to change contour level back up to 1.0 (Contour level returned to 1.0...).

 Hydrophobe feature placed on contour surface at contour level 3.0 Contour level returned to 1.0 results in a partially buried color atom

### File¶

New Query... Starts the Query Building Wizard. The resulting query (or queries) will be listed in the query input for a simple or validation run and displayed in the 3D window. They are also placed in the Shape Inventory list. This option is only available in view mode.
Open Query... Opens a file browser for opening a saved ROCS query file. Acceptable saved query sources are molecules of a variety of 3D molecule types, grids or shape queries (*.sq). The opened file is listed as a potential query for simple or validation ROCS runs and displayed in the 3D window. It is also placed in the Shape Inventory list. This option is only available in view mode.
Add to Shape Inventory... Starts the Query Building Wizard. The resulting query (or queries) will be placed in the Shape Inventory list. This option is only available in edit mode
Open... Opens a file browser for opening a shape query source. Acceptable shape sources are molecules of a variety of 3D file types, grids or shape queries (*.sq). The opened file will be placed in the Shape Inventory list. This option is only available in edit mode
Save Query... Save the current query in ROCS query format (*.sq). This is a save as... operation. The user will be prompted to enter a filename and directory location for saving the new *.sq file. The current file will never be overwritten. Keyboard shortcut is CTRL+S.
Save Color Force Field... Save as... operation to save a copy of the current color force field file and prompts for a filename and directory. This can be useful as a starting point to edit a new custom color force field file.
Clear... Clears all objects from the current vROCS session. Keyboard shortcut is CTRL+N.
Recent Queries... Select from a list of recently opened queries - molecules, grids and shape queries. The selected query will be added to the list of queries available for simple or validation ROCS runs and will be displayed in the 3D window. The list persists across vROCS sessions. This option is only available in view mode.
Recent Databases... Select from a list of recently opened databases. The selected database will be used to populate the database field for simple ROCS runs. The list persists across vROCS sessions. This option is only available in view mode for the simple run set-up and sometimes from the Welcome page.
Recent Actives... Select from a list of recently opened databases. The selected database will be used to populate the actives field for validation ROCS runs. The list persists across vROCS sessions. This option is only available in view mode for the validation run set-up and sometimes from the Welcome page.
Recent Decoys... Select from a list of recently opened databases. The selected database will be used to populate the decoys field for validation ROCS runs. The list persists across vROCS sessions. This option is only available in view mode for the validation run set-up and sometimes from the Welcome page.
Recents... Select from a list of recently opened shape sources - molecules, grids and shape queries. The list persists across vROCS sessions. This option is only available in edit mode.
Exit... Close the vROCS session. Displayed in the Application menu on Mac as Quit...

### Edit¶

Undo... Undo the last action. This can be done repeatedly or see Undo History... Keyboard shortcut is CTRL+Z
Undo History... Shows the last 10 operations that can be selected to undo. The Undo history list can be much greater than 10 items so revisit this list to see additional items.
Redo... Redo the last action that was just undone. This can be done repeatedly or see Redo History... Keyboard shortcut is CTRL+Y
Redo History... Shows the last 10 operations that have been undone and can be selected to redo. The Redo history list can be much longer than 10 items so revisit this list to see additional items.
Preferences... Opens the Preferences dialog to set user preferences that will persist across vROCS sessions. There are two pages: vROCS and Display. See below for full details. Displayed in the Application menu on Mac.

#### Preferences¶

Every user has his or her own individual preferences with regards to how molecules, grids, and surfaces should look and how applications should behave. For this reason, a Preferences dialog is available which allows customization of the application to the user’s preference. The first time vROCS is launched the Preferences will open automatically, to enable the user to set his or her own options. A snapshot of the Preferences dialog can be seen below. On the left-hand side of the dialog is column containing preference categories. These categories include: vROCS and Display. Clicking on any of these categories will update the right-hand side of the dialog to display the options corresponding to the selected category. The vROCS category includes preferences for ROCS and the color force fields. The Display category contains preferences for the Open GL display of molecules and shapes.

#### vROCS Preferences¶

Preferences: vROCS

Default color force field

Click the radio buttons to select the default color force field which will be used to apply color atoms to molecules that are loaded into vROCS or to define the nature and interaction of color atoms that are added during manual query editing. This color force field information will be saved in any saved query file. Options are:

Implicit Mills-Dean

Explicit Mills-Dean

Custom

To use a custom color force field define the path to the color force field (*.cff) file that contains definitions of the color atoms. Multiple custom color force fields can be loaded into vROCS and are listed in the box. The selected (active) custom color force field will be highlighted in blue. Custom color force fields can be deleted from the list by clicking on the red X beside their name in the list.

Changing the default color force field during an vROCS session will not result in changing the force field for any currently opened molecules or queries. Changes will take place only if no molecules or queries are opened or on closing and reopening vROCS.

Display ROCS run in 3D Check on to display a 3D Open GL rendering of the query and database molecules aligning as the run progresses together with 2D structures for the current 5 best hits. Click off to display a text-based run progress summary, saving compute resources on lower performance computers. This is equivalent to checking on/off the 3D View option in the simple or validation Run Set-Up Options dialog but persists across all runs and sessions, not a single run.
Color Atom Styles Change the color and style for the different color atom types. Select the force field from the drop-down menu. For each color atom type select a color from the drop-down color list and a style (solid or mesh) for the color atom display. Restore default color atom colors and styles by clicking the Restore button below.
Restore Click to restore the current preferences to the default ones.
Save Save the current preferences for this session and for future sessions of vROCS.
Cancel Click to close this dialog without applying any of the changed preferences.

#### OpenGL Preferences¶

Preferences: Display

Background Color Change the background color by selecting from a drop-down color list.
Lighting Position Sets the position of the lighting used in the 3D view.
Material Shininess Sets the shininess of solid-rendered objects in the 3D view.
Disable OpenGL Shaders Turns off hardware shading functions. May be useful if the 3D scenes are not rendering properly.
Disable Hardware Acceleration Turns off all hardware rendering. May be useful if scenes are not rendering properly due to video driver problems.
Screenshot Shares Context May be useful if screenshots are not being saved properly on some systems.
Restore Click to restore the current preferences to the default ones.
Save Save the current preferences for this session and for future sessions of vROCS.
Cancel Click to close this dialog without applying any of the changed preferences.

There are three buttons at the bottom of the dialog. Clicking on the Restore button will restore the current preferences to the default ones. Clicking on the Save button will save the current preferences for this run and for future runs of vROCS. Clicking on the Cancel button will close this dialog and will not apply any of the changed preferences.

Preferences are stored in a binary file (preferences.oeb) in a user-specific local directory on the computer currently running the application. The preferences file can be found in:

• C:\Documents_and_Settings\USERNAME\AppData\Local\OpenEye\vROCS\<version> on Microsoft Windows Vista.
• C:\Users\USERNAME\AppData\Local\OpenEye\vROCS\<version> on Microsoft Windows 7.
• ~USERNAME/.OpenEye/vROCS/<version> on all other platforms.

While the preference file shares the same file extension as the OpenEye’s binary database file, it cannot be read into vROCS using the File > Open menu item. The preferences file is loaded automatically when the application starts and is saved back to disk when the application exits. Deleting this file is equivalent to clicking on the Restore button in the dialog box.

There is also a file in this same directory which called vROCS.ini which contains machine specific settings like the list of recent files, preferred layouts, and hardware stereo options. Deleting this file will restore these settings to the defaults as well.

This directory can be opened from within vROCS by selecting the Open User Directory menu item in the Help menu.