Tutorials

Introduction

Four tutorials are included which guide the user through examples of each of the main tasks available in vROCS. A fifth tutorial illustrates setting up a simple run via the ROCS command line interface. The aim of providing these tutorials is to familiarize the user with the steps required to complete each task and give an understanding of what the task involves. More detailed background information on any of the options or dialogs is available in the Usage sections of this manual. Each tutorial is designed to stand alone so the user can choose which bets fits his/her current research needs. We encourage the user to run the tutorials initially and thereafter they can be used as a guide for your own experiments.

Data files for these tutorials are located in the directory OPENEYE_DIR/data/vrocs where OPENEYE_DIR refers to the top level OpenEye installation directory. A versioned ROCS directory in C:\Program Files (x86)\ is default on Windows. The data and documentation directories are easily accessible in OSX distributions as standalone folders in the package. This directory contains four sub-directories:

Build/edit a query using the Wizard

Background information

This tutorial teaches the user to create a new query for either saving or use in ROCS. The wizard creates a query through one of a few predesigned paths.

You have recently been assigned to a trypsin inhibitor project. You are interested in building a query from known ligands in their binding modes, suitable for use in ROCS for vHTS. A set of 19 trypsin inhibitors are available as co-crystal structures (See [PDB-IDs]) in the PDB (See [PDB]). The dataset has been prepared by aligning the 19 protein crystal structures. The ligands were then extracted to give a set of 19 ligands, aligned in the protein binding pocket frame of reference. The query building wizard will be used to construct a ROCS query employing just a few of the ligands that are most representative of the set as a whole.

The tutorial will require approximately 10 minutes to complete.

Build models using the wizard

Open a new ROCS session. At the Welcome screen select the option to Create a query with a wizard. This will open up the Build a new query dialog.

In the top tab, Create Query, select the radio button for Ligand Model Builder and then click Next.

In the Load Aligned Ligands tab click on the Filename area and browse to the file containing the 19 aligned trypsin ligands. This file is located in the OPENEYE_DIR/data/vrocs/wizard directory and is called pdbmodel_ligands.oeb.gz. Click OK in the file browser to accept the choice of file. The file will be loaded and the first ligand is displayed in the preview. Examine the ligand by rotating (left mouse button) or zooming (mouse wheel) the structure image. Scroll through the list of ligands using the green arrows. When satisfied click Next to continue.

It is not required to change any of the options in the Adjust Parameters tab, as indicated by the green check mark next to the tab name. However, for the purposes of this tutorial we will keep the Max Molecules Per Model as 3 (consider only models containing 1,2 or 3 molecules) but increase the Models to Keep to 3 (keep the best 3 models for further review). Enter Trypsin in the Prefix field and check the box to Merge Color Atoms.

Click Next to begin building the models. A progress dialog will provide information on the model building. When all the 1159 models containing 1, 2 or 3 molecules from the dataset of 19 trypsin inhibitors have been built and compared the top 3 models are listed in the Pick Queries dialog.

Visualize the results

The first model in the list, ‘Trypsin 1’, is displayed in the preview window and is made from three of the ligands: 1G3E.pdb, 1GHZ.pdb and 1QB6.pdb. This is the model that is most representative of the dataset of 19 inhibitors as a whole.

Note that the three ligands align at one end of the model whilst the other end is described by only the single, larger molecule. Where the three ligands align closely the donor/cation/donor triad of color atoms have been merged to a single representation, instead of close color atoms from each ligand (e.g. three partially overlaid cation color atoms). Retaining multiple instances of the color atoms would serve to stress the importance of these features in the model over the other features.

Click on the second model, ‘Trypsin 2’. Its name will be highlighted in blue and it will be displayed in the preview window. This is the second most preferred model and is also made of three ligands: 1H4W.pdb, 1K1I.pdb and 1QB9.pdb. Note that this is a different set of three ligands than Trypsin 1 (although it is possible for some ligands to be used in multiple high ranking models).

The third model, ‘Trypsin 3’, contains only two ligands: 1GJ6.pdb and 1QBO.pdb. All three models contain one of the larger, hinged ligands and at least one of the set of smaller ligands.

Save the results

Place a check mark next to the name of each model to select it for export to the main vROCS interface. In this tutorial we will export all three models and save them so they can be used in further validation experiments. However, in your own work you may choose only some of the models for export.

Having checked all three models click Finish to close the Build a new query dialog. Model ‘Trypsin 1’ will be displayed in the main vROCS 3D window and the Welcome panel is to the left of the screen. In the Welcome panel click on Perform a simple ROCS run. The three models are listed as potential queries in the Inputs dialog, with ‘Trypsin 1’ as the active query. To save ‘Trypsin 1’ as a ROCS saved query (*.sq) file select File > Save Query As… from the main File menu. Navigate to your preferred working directory and enter Trypisn_1.sq as the file name. Click OK to save the file. In the query list select model ‘Trypsin 2’ and use File > Save Query As… to save this query as Trypsin_2.sq. Repeat with model ‘Trypsin 3’ to save a query with filename Trypsin_3.sq.

Conclusions

This concludes the tutorial “Build/edit a query using the Query Wizard”. In this tutorial we used a set of 19 aligned known trypsin ligands and built three potential ROCS queries, using up to three of the ligands from the dataset. These queries best describe the dataset as a whole, based on TanimotoCombo scores of all the potential models. These models can be further validated before use, as described in the tutorial Perform a ROCS validation run.

If you have time some suggestions for further study are:

  • Try changing some of the model building parameters. (Note, it is recommended to keep the maximum number of molecules in a model low (less than or equal to five), to avoid extended run times for model building and overly complex output models/ROCS queries.
  • Modify the current force field in the Preferences dialog (Edit > Preferences, vROCS tab) and see whether this changes the models
  • Re-run the tutorial steps with your own set of aligned ligands

Build/edit a query manually

Background information

In this tutorial the user will learn to manually create a new query either for saving or for use in ROCS. It also covers the steps required to edit or modify a saved query.

A grid shape query is available that describes a protein binding pocket. To improve the quality of results obtained for alignment to the query you wish to add some color features that are known to be important for protein-ligand binding. This tutorial will guide the user through the basics of adding color atoms and saving the resulting ROCS query file.

The tutorial will require approximately 15 minutes of personal time to complete.

Build and edit a new query

Open a new ROCS session. At the Welcome screen select the option to Create or Edit a Query Manually. Click File > Open and browse to the file OPENEYE_DIR/data/vrocs/edit/erantag_shape.grd. This is a grid-based shape file that represents the shape of the binding pocket for the estrogen antagonist receptor, 3ERT, as downloaded from the PDB (See [PDB]). The file will be listed in the Shape Inventory list as “unnamed”. The grid shape will display in the 3D window as an opaque shape. Click and drag the file up to the Current Query list. It is renamed as “Shape from ‘erantag’”. The representation in the 3D window becomes transparent.

Note

To use this query as-is in vROCS click the Use in ROCS button to return to the Welcome screen. The grid shape will be imported into the query list as the active query for either a simple or validation ROCS run

Several active estrogen receptor antagonists have a similar color pattern, consisting of two phenol moieties, which can act as either donor or acceptor, and a cation on a flexible chain, as shown in Annotated OEDepict TK depiction of generalized estrogen receptor active ligand. These are the color atoms that will be added to the query.

Annotated OEDepict TK depiction of generalized estrogen receptor active ligand

Annotated OEDepict TK depiction of generalized estrogen receptor active ligand

Rotate the shape in the 3D window until it resembles the orientation in Profile orientation.

Profile orientation

Profile orientation

Use the Contour slider at the bottom of the 3D window to raise the contour display threshold to 1.25. Click on the Add color atom button at the left of the 3D Window (See Add acceptor atoms) and select Acceptor from the dropdown. Click on the contour surface to add two acceptor atoms, as shown in Add acceptor atoms.

Add acceptor atoms

Add acceptor atoms

Click on the Add color atom tool at the left of the 3D Window and select Donor from the dropdown. Click on each of the acceptor features to add two donor atoms at the same point, as shown in Add donor atoms.

Add donor atoms

Add donor atoms

Adjust the Contour slider back to a contour level of 1.0. Click on the Add color atom button at the left of the 3D Window and select Cation from the dropdown. Click on the contour surface to add a cation atom as shown in Initial cation placement.

Initial cation placement

Initial cation placement

Then CTRL-click on the surface twice more, as indicated by the orange dots in Final cation placement, to move the cation color atom mid-way between the three surface points.

Final cation placement

Final cation placement

This completes adding the color points to the query. If you make any errors use the Delete tool (eraser icon) at the left of the 3D window to delete a color atom and try again.

Save query

In the Current Query area right–click on the query name and rename the query as “erantag_shape_color”. This is the query name that will be displayed in the Query list for any ROCS runs. To save the edited query click the Save Query button or use the menu item File > Save Query… Choose a directory in which to save the query and give it the name erantag_shape_color.sq. To use this query in ROCS click on the Done editing icon in the 3D window or the Accept button to return to the Welcome screen.

Conclusions

This concludes the tutorial “Create or edit a query manually”. In this tutorial we modified a shape grid by adding color atoms to build a more complex ROCS query containing information about known binding interactions. This query can be further validated before use, as described in Tutorial 3: Perform a ROCS validation run.

Perform a ROCS validation run

Background information

A validation run with ROCS allows you to select a set of active molecules and a set of decoy molecules against which to run your query. ROCS is run against both datasets and generates statistics evaluating how well the query discriminated between the actives and the decoys. This becomes particularly important when building a complex query. It suggests confidence levels for this query in future ROCS runs against databases of compounds of unknown activity.

You have just been assigned to a new research project looking for trypsin antagonists. There are no in-house lead molecules or SAR (structure activity relationship) yet. However, you have built several potential queries from published trypsin antagonists using the vROCS Ligand Model Builder. You would like to screen the corporate database to identify some compounds for screening in an in-house biological assay that has just come on-line. You plan to use vROCS to validate your queries on a sample database and identify the most selective query before running it on the larger corporate database.

After completing this tutorial the user will be aware of the steps required to set up and run a ROCS query validation in vROCS and analyze the resulting data. The tutorial will require approximately 10 minutes of personal time and 30 minutes of computer time to complete.

Setup ROCS run

Open a new vROCS session. At the Welcome screen select the option to Perform a ROCS validation.

In the Inputs dialog you will need to select a query, set of active molecules and set of decoy molecules. The fields that require data input are highlighted in red. Note: if you did not open a new ROCS session you may have some fields populated with previously used queries and datasets.

Check that the Color F.F. dropdown has Implicit Mills Dean selected as the current color force field. This is the force field that was used to build the query we will use. (Note: opening a query that was built with a force field other than that listed in the Color F.F. dropdown will result in a warning pop-up. If this should occur then click Yes to accept the change to the active color force field. This will update the Color F.F. dropdown and the list of available queries but will not change your default selection for future vROCS sessions in Edit > Preferences.)

In the Query input field click on Open… Use the file browser to navigate to either the directory where you saved the queries built in Tutorial 1 or to OPENEYE_DIR/data/vrocs/validation/trypsin/ and open the file Trypsin_1.sq. This is the first of three queries selected by the Ligand Model Builder and saved, as described in the tutorial Build/edit a query using the Query Wizard. The files are provided for you and completion of that tutorial is not a prerequisite to this tutorial. It will be used as one of the queries for ROCS.

The Trypsin_1.sq query is shown in the 3D window as green sticks. A molecular shape (transparent gray) and color atoms are automatically assigned. For the purposes of this tutorial we will use the ligand query as-is. Tutorial 2: Build/edit a query manually covers the details of editing a query.

Repeat the steps to open queries Trypsin_2.sq and Trypsin_3.sq into the query list.

Click on the first query in the list (Trypsin_1.sq) so that it is highlighted in blue and becomes the active query for ROCS. Click on the text ROCS Run 1 in the Run Name field of the Inputs dialog and edit it to read Trypsin1.

For a validation run two databases are required. The first contains the ‘active’ molecules. This is a dataset that you would like to score highly against the query. The second dataset of ‘decoys’ should contain molecules that you predict will align poorly to the query. In a typical pharmaceutical industry setting the ‘actives’ might be compounds from your current SAR and the ‘decoys’ might be a sub-set of the corporate database. In this tutorial we will use the Trypsin actives and decoys sets from the DUD (Directory of Useful Decoys) database (See [Huang-2006]). The decoy set is property matched to the actives (e.g. similar molecular weight, calculated LogP) but molecules have dissimilar topology in order to provide a challenging validation experiment. The more similar the actives and decoys the more confidence you can have that your query is truly selective.

The databases have been pre-prepared for this tutorial in the following manner. 1. The ligands (actives) and decoys datasets were downloaded from DUD in mol2 file format. 2. Conformers were generated using OMEGA2.2 and default settings (up to 200 conformers for each molecule).

No effort was made to clean up the dataset and remove any duplicates, filter for molecular properties (in theory this was done by DUD) and enumerate stereoisomers. These are all data preparation steps you should consider for your own dataset. However, the purpose of this tutorial is to illustrate the vROCS validation tools, not dataset preparation.

In the Inputs dialog click on the Actives option. Browse to the database of 49 known active trypsin ligands at OPENEYE_DIR/data/vrocs/validation/trypsin/trypsin_ligands_confs.oeb.gz. Similarly, populate the Decoys option with the database of 1664 property matched trypsin decoy ligands at OPENEYE_DIR/data/vrocs/validation/trypsin/trypsin_decoys_confs.oeb.gz.

Click Next to set the run options.

The Options dialog provides access to modify the main options for ROCS. It also displays the ROCS command line, should you wish to repeat this run outside the vROCS interface. Full descriptions of all the options are given in the Validation Run options.

The Working Directory is set by default to your vROCS installation directory. It is good practice to set a unique working directory for each run to avoid the risk of overwriting output files from old runs. Alternatively, changing the Run Prefix option would have a similar outcome. Create a working directory named trypsin_validation in a location of your choice and assign the prefix trypsin1.

Leave all the fields in the Options dialog with their default values except if you are using a computer with low memory. In that situation you may want to toggle Off the 3D View option. This will speed up the ROCS runs because the CPU is not being used for an Open GL 3D display of the aligned query and current database molecule.

Click Next to see the run summary. This lists the run name, database, working directory, etc.

Run ROCS in validation mode

Click Run ROCS to start the run.

As the run progresses the 3D window fills the screen. The query is shown and the database molecules scroll by in their alignment with the query. On the right hand side of the screen the five current top scoring molecules are shown in 2D depiction, together with their TanimotoCombo score (or whichever score type was chosen in the Run Set-up Options dialog). A progress bar at the bottom of the screen indicates how far the run has proceeded. If the 3D view was selected Off then text-based progress information will be displayed.

This run requires about 10 minutes. The relatively slow run speed is due to the large number of color features in the query used. Manually editing the query to remove some of the color features does result in some run time speed up. See the section Editing ROCS queries in vROCS and the tutorial Build/edit a query manually for details on how to accomplish this.

Repeat the steps to set-up and run query Trypsin_2.sq with run name Trypsin2 and prefix trypsin2 and Trypsin_3.sq with run name Trypsin3 and prefix trypsin3.

Visualize results

Once the three runs are complete you will see results listed for each run on a separate tab in the results spreadsheet at the bottom of the screen. Click on the tab name for the Trypsin1 run to view the results of the first run. On the left side of the screen navigate to the Run Set-up Inputs dialog and select/highlight the query Trypsin_1.sq in the Query list. This action displays the query in the 3D window.

Note

In the case that 3D View was checked off during the runs click Done in the run progress informational panel to restart the 3D window and display the query.

The spreadsheet lists the query and the top 20 scoring conformers ranked based on TanimotoCombo score (or whichever score type was chosen in the Run Set-up Options dialog). The query is also listed as the first entry as an aid to comparison. Make the query visible by clicking in the visibility column (green circle) next to its entry. Select individual results by clicking on their name and scroll up/down the list with the arrow keys to check the alignments look reasonable.

Click on the Show the statistics icon statsicon at the far right of the spreadsheet to display the statistics panel for Trypsin1. All the database molecules used in the search are included in these calculations, not just the top 20 that were listed in the spreadsheet for visual inspection. AUC for the ROC curve and enrichment at 0.5%, 1% and 2% are listed for the run, together with their upper and lower 95% confidence limits. The ROC curve is also displayed. An AUC of 0.865 indicates a query that is predictive and well able to separate the actives from the decoys. Change the score used in the calculations from TanimotoCombo to ShapeTanimoto in the Metric dropdown. This will update the ROC plot and statistics to reflect that scoring metric. The AUC is now 0.640, indicating that shape alone is a less selective metric for identifying trypsin actives from decoys and that color is an important addition to the query.

Inspect the score histogram plot by selecting Score Histogram in the Chart dropdown. This displays color coded histograms for the score distribution within the active and decoy databases. A more selective query will have a score distribution with higher frequency of obtaining a higher score (i.e. further to the right of the plot). As before, changing the score metric will update this chart.

Compare results from multiple runs

With the Trypsin1 results tab selected, making Trypsin1 the Base run, select Trypsin2 and Trypsin3 from the Compare to dropdown. Choose ROC plot from the Chart dropdown and TanimotoCombo from the Metric dropdown. The display should look similar to the image below:

Comparison of runs Trypsin1 (Base) with Trypsin2 and Trypsin3

Comparison of runs Trypsin1 (Base) with Trypsin2 and Trypsin3

At a glance the ROC plot shows that Trypsin2 has the highest AUC, suggesting it is the most selective query over the entire database search. This is supported by the p-values. Trypsin2 has a p-value of 0.940 when compared to the Trypsin1 base run. There is an extremely high probability that the Trypsin_2.sq query is more selective than Trypsin_1.sq. Run Trypsin3 has the lowest AUC and a p-value of 0.006 indicates there is a very low probability of query Trypsin_3.sq being more selective than Trypsin1. The early enrichments for run Trypsin2 are much higher that for either Trypsin1 or Trypsin3 and the p-value of 1.000 for Trypsin2 indicates that the query, Trypsin_2.sq, is better able to rank most of the actives very highly than is Trypsin_1.sq. This early enrichment is not due to chance alone. Trypsin3 has a p-value of ~0.34 at 5% enrichment (there may be some slight differences between calculated p-values between vROCS sessions due to the random factors used in the bootstrapping process). It does, indeed have a lower early enrichment than Trypsin1, as the probability (p-value) suggests, but the enrichment for the two runs is similar, especially when considering the ±95% confidence limits for each run. It is not possible to state with certainty that either query, Trypsin_1.sq or Trypsin_3.sq, gives better early enrichment.

Save results

An image of the ROC plot can be saved and used in presentations. Click on the Trypsin1 run-name tab to make it highlighted as the active view. The ROC plot currently shows three curves, one each for runs Trypsin1 as well as Trypsin2 and Trypsin3. In the Choose stats to save... dropdown select the option to Plot data. In the dialog increase the dimensions (resolution) to 500 in the first box. Since Maintain original aspect is checked on the second dimension will update automatically. Use the Browse option to select a directory of your choice. Change the name of the file form screenshot.png to trypsin_ROC_plot.png. Click OK to save the image file.

It can also be useful to save a copy of the statistical comparison in the spreadsheet. Click on the Choose stats to save… dropdown and select the option for Spreadsheet. In the dialog navigate to a directory of your choice and name the file trypsin_spreadsheet.csv. A file will be saved containing the AUC and enrichment values, 95% confidence limits and p-values for the three runs currently being compared in the Trypsin1 tab.

Conclusions

From the comparison we can conclude that Trypsin_2.sq is the most selective query of the three in this validation. When the query models were built Trypsin_1.sq was ranked highest by the Ligand Model Builder. However, that was on a rigid dataset of the 19 single conformer candidates for the model building. This validation was carried out on a multi-conformer set of active and decoy ligands which is closer to the real life scenario under which the query will be used i.e. the ROCS run of the corporate database to identify potential screening candidates. The high early enrichment of Trypsin_2.sq makes this query especially appealing for such a search.

This concludes the tutorial “Perform a ROCS validation”.

If you have time some suggestions for further study are:

  • Try comparing the runs above to the Lingos 2D similarity metric. Lingos is a useful metric and often provides good selectivity
  • Run a validation of the estrogen receptor antagonist shape + color query built in the tutorial Create or edit a query manually. Compare it against the shape alone and the native ligand. The following files are provided in the OPENEYE_DIR/data/vrocs/validation/erantag/ directory.
    • erantag_grid.sq – a ROCS saved query file of the grid shape
    • erantag_color.sq – a ROCS saved query file of the grid shape with added color points
    • 3ERT_lig.ent – the ligand from the 3ERT estrogen receptor crystal structure
    • er_antagonist_ligands_confs.oeb.gb – the dataset of active ligands
    • er_antagonist_decoys_confs.oeb.gb – the dataset of decoy molecules

Perform a simple ROCS run

Background information

A simple ROCS run allows you to choose a query and a database to search. It will search the database for the closest matches to your query and load your results afterwards. The majority of runs with ROCS are simple runs using a molecular shape-based query, with or without color atoms. Other query types can also be used, in a similar fashion, as described in the section ROCS shape query sources.

You are working on a project for which the current lead series has a problem with CYP2C9 metabolism. You are hoping to suggest some synthesis candidates that have a low risk for CYP2C9 metabolism. You recently saw a paper by Sykes et. al. ([Sykes-2008]) in which they use ROCS to validate a database of 70 known Cytochrome P450 2C9 substrates. You plan to use ROCS similarity to compounds in this database to predict whether your ideas will be metabolized by CYP2C9. One of your ideas is shown in the depiction below. You will use ROCS through the vROCS interface.

OEDepict TK depiction of the synthesis candidate molecule and ROCS query

OEDepict TK depiction of the synthesis candidate molecule and ROCS query

This tutorial will provide the user with the background required to set up a simple ROCS run in the vROCS interface from a saved query and analyze the results. The tutorial will require approximately 10 minutes of personal time and 1 minute of computer time to complete.

Setup ROCS run

Open a new ROCS session. At the Welcome screen select the option to Start a simple ROCS run.

In the Inputs dialog you will need to select a query and a database to search. The fields that require data input are highlighted in red. Note: if you did not open a new ROCS session you may have some fields populated with previously used queries and datasets.

In the Query input field click on Open query… Use the file browser to navigate to OPENEYE_DIR/data/vrocs/simple and open the file molecule_idea.sq. The query was prepared by generating OMEGA conformers from a SMILES file of the molecule. Default OMEGA parameters were used and the lowest energy conformer was selected as the query and saved as a ROCS saved query file named molecule_idea.sq.

The molecule is shown in the 3D window as green sticks. A molecular shape (transparent gray) and color atoms are automatically assigned. For the purposes of this tutorial we will use the ligand query as-is. The tutorial Create or edit a query manually covers the details of editing a query.

Click on the text ROCS Run (1) in the Run Name field and edit it to read Molecule_idea.

In the Inputs dialog click on the Database option. Browse to the database at OPENEYE_DIR/data/vrocs/simple and open the database file called jmedchem_database.oeb.gz. The database was prepared from a SMILES file of the 70 CYP2C9 substrates, for which OMEGA conformers were generated with default parameters.

Click Next to set the run options.

In a location of your choice create a working directory called simple.

In the literature, the authors obtained best results for their validation using the parameter Random Starts = 50 instead of the default Inertial start. This allows more conformational space to be sampled but does increase the time taken by the run.

Note

Inertial starts will give identical results from run to run. Using Random starts may result in slight variation in score (e.g. 0.750 vs 0.749) if a run is repeated.

Click on the Random radio button for Start type and type 50 in the input box. Leave all other options as default.

Click Next to see the run summary. This lists the run name, database, working directory, etc. Click on the Command Line… option to view the full command line for the run. This can be useful to set up future ROCS runs outside the vROCS interface e.g. to make use of distributed computing.

Run a simple ROCS run

Click Run ROCS to start the run.

As the run progresses the 3D window fills the screen. The query is shown and the 70 database molecules scroll by in their alignment with the query. On the right hand side of the screen the five current top scoring molecules are shown in 2D depiction, together with their score (TanimotoCombo by default or whichever score type was chosen in the Options dialog). A progress bar at the bottom of the screen indicates how far the run has proceeded.

This run requires around 40 seconds.

Visualize and save results

When the run is complete the query will display in the 3D window and the 20 top scoring hits, based on the score chosen in the Score by field during the run set-up (default is TanimotoCombo), are listed in the results spreadsheet. This is intended as a quick summary view. To examine all the hits then the full hitlist, named rocs_hits_1.oeb.gz (default filename) is automatically saved in the working directory for viewing with tools such as VIDA. The query is also listed as an aid to comparison. Make the query visible by clicking in the visibility column (green circle) next to its entry. Click on a molecule name in the spreadsheet to display its best overlay with the query and scroll through the molecules to view each one.

Scores for all the available scoring functions are shown in the spreadsheet. Clicking on any one of the column headings will sort by that score and clicking multiple times will switch between ascending and descending. The 20 hits listed in the spreadsheet will change to reflect the sort preference (score and ascending (bottom 20) or descending (top 20 hits)).

With descending TanimotoCombo selected as the sort order look at the TanimotoCombo scores for the top scoring ligand. Suprofen_261 has a TanimotoCombo score of approximately 1.002. (Because Random starts were used your score might be slightly different.) The authors found that 0.99 was a reasonable threshold for a “reliability” cutoff for alignments that reproduced known binding poses in CYP2C9. This molecule satisfies that requirement.

Conclusions

The query molecule has a relatively high shape similarity to Suprofen_261. The Shape Tanimoto score of 0.704 is the second highest for the entire dataset. However, low color similarity (Color Tanimoto = 0.297) results in an overall TanimotoCombo score of 1.002. Given the low TanimotoCombo score to the known CYP2C9 ligands (max is Suprofen_261 at 1.002) one might conclude that this molecule has a low risk of being metabolized by CYP2C9 and is worthy of further investigation.

This concludes the tutorial “Perform a simple ROCS run” which guides the user through the basics of running ROCS through the vROCS interface and visualizing the results.

Perform a ROCS run from the command line

Background information

A simple ROCS run allows you to choose a query and a database to search. It will search the database for the closest matches to your query and load your results afterwards. The majority of runs with ROCS are simple runs using a molecular shape-based query, with or without color atoms. Other query types can also be used, in a similar fashion, as described in section ROCS shape query sources.

You are working on a project for which the current lead series has a problem with CYP2C9 metabolism. You are hoping to suggest some synthesis candidates that have a low risk for CYP2C9 metabolism. You recently saw a paper by Sykes et. al ([Sykes-2008]) in which they use ROCS to validate a database of 70 known Cytochrome P450 2C9 substrates. You plan to use ROCS similarity to compounds in this database to predict whether your ideas will be metabolized by CYP2C9. One of your ideas is shown in the OEDepict TK depiction below. You plan to use ROCS command line.

OEDepict TK depiction of the synthesis candidate molecule and ROCS query

OEDepict TK depiction of the synthesis candidate molecule and ROCS query

This tutorial will provide the user with the background required to set up a simple ROCS run in the command line ROCS interface from a saved query and analyze the results. The tutorial will require approximately 10 minutes of personal time and 1 minute of computer time to complete. This is a repeat of the simple run the vROCS tutorial Perform a simple ROCS run.

Setup ROCS run

Open a command prompt and navigate to a working directory of your choice. At the very minimum ROCS requires an input query file and an input database file. There are numerous additional parameters (e.g. specifying a custom color force field file) that can be added to the command, as described in the chapter ROCS Usage.

In the vROCS tutorial Perform a simple ROCS run, the command line provided to ROCS (on Windows) is:

rocs.bat \
-query molecule_idea.sq \
-dbase jmedchem_database.oeb.gz \
-prefix rocs \
-outputdir C:/Users/username \
-besthits 500 \
-rankby TanimotoCombo \
-shapeonly false \
-randomstarts 50 \
-opt true \
-scoreonly false \
-optchem true \

This is found in the Command line… information field of the run set-up dialog. Most of these parameters are default. The only one that was changed was –randomstarts 50, instead of the default Inertial start.

Run a simple ROCS run

At the command prompt type the following command to run ROCS.

Note

If the query and database files are in your current working directory you do not need to specify a full directory path to the files. If not, specify the full path to the query and database files.

rocs \
–query \
OPENEYE_DIR/data/vrocs/simple/molecule_idea.sq \
–dbase \
OPENEYE_DIR/data/vrocs/simple/jmedchem_database.oeb.gz \
–prefix tutorial5 \
–randomstarts 50

This run requires about 40 seconds.

Visualize and save results

ROCS automatically saves the following files using the prefix given above in the naming convention. If no prefix is given then the default prefix is rocs.

  • prefix_parm
  • prefix.log
  • prefix_ref.sq
  • prefix_1.rpt
  • prefix_1.status
  • prefix_hits_1.sdf

The .rpt file contains all the score data in a tab delimited format. The .sdf contains all the database hits in their aligned conformation, together with scores. This sd file is suitable for opening in VIDA for further examination.

Conclusions

This concludes the tutorial “Perform a ROCS run from the command line” which guides the user through the basics of running a simple ROCS run through the command line interface and visualizing the results.