Perform a simple ROCS run

Background information

A simple ROCS run allows you to choose a query and a database to search. It will search the database for the closest matches to your query and load your results afterwards. The majority of runs with ROCS are simple runs using a molecular shape-based query, with or without color atoms. Other query types can also be used, in a similar fashion, as described in the section ROCS shape query sources.

You are working on a project for which the current lead series has a problem with CYP2C9 metabolism. You are hoping to suggest some synthesis candidates that have a low risk for CYP2C9 metabolism. You recently saw a paper by Sykes et. al. ([Sykes-2008]) in which they use ROCS to validate a database of 70 known Cytochrome P450 2C9 substrates. You plan to use ROCS similarity to compounds in this database to predict whether your ideas will be metabolized by CYP2C9. One of your ideas is shown in the depiction below. You will use ROCS through the vROCS interface.

OEDepict TK depiction of the synthesis candidate molecule and ROCS query

OEDepict TK depiction of the synthesis candidate molecule and ROCS query

This tutorial will provide the user with the background required to set up a simple ROCS run in the vROCS interface from a saved query and analyze the results. The tutorial will require approximately 10 minutes of personal time and 1 minute of computer time to complete.

Setup ROCS run

Open a new ROCS session. At the Welcome screen select the option to Start a simple ROCS run.

In the Inputs dialog you will need to select a query and a database to search. The fields that require data input are highlighted in red. Note: if you did not open a new ROCS session you may have some fields populated with previously used queries and datasets.

In the Query input field click on Open query… Use the file browser to navigate to OPENEYE_DIR/data/vrocs/simple and open the file molecule_idea.sq. The query was prepared by generating OMEGA conformers from a SMILES file of the molecule. Default OMEGA parameters were used and the lowest energy conformer was selected as the query and saved as a ROCS saved query file named molecule_idea.sq.

The molecule is shown in the 3D window as green sticks. A molecular shape (transparent gray) and color atoms are automatically assigned. For the purposes of this tutorial we will use the ligand query as-is. The tutorial Create or edit a query manually covers the details of editing a query.

Click on the text ROCS Run (1) in the Run Name field and edit it to read Molecule_idea.

In the Inputs dialog click on the Database option. Browse to the database at OPENEYE_DIR/data/vrocs/simple and open the database file called jmedchem_database.oeb.gz. The database was prepared from a SMILES file of the 70 CYP2C9 substrates, for which OMEGA conformers were generated with default parameters.

Click Next to set the run options.

In a location of your choice create a working directory called simple.

In the literature, the authors obtained best results for their validation using the parameter Random Starts = 50 instead of the default Inertial start. This allows more conformational space to be sampled but does increase the time taken by the run.

Note

Inertial starts will give identical results from run to run. Using Random starts may result in slight variation in score (e.g. 0.750 vs 0.749) if a run is repeated.

Click on the Random radio button for Start type and type 50 in the input box. Leave all other options as default.

Click Next to see the run summary. This lists the run name, database, working directory, etc. Click on the Command Line… option to view the full command line for the run. This can be useful to set up future ROCS runs outside the vROCS interface e.g. to make use of distributed computing.

Run a simple ROCS run

Click Run ROCS to start the run.

As the run progresses the 3D window fills the screen. The query is shown and the 70 database molecules scroll by in their alignment with the query. On the right hand side of the screen the five current top scoring molecules are shown in 2D depiction, together with their score (TanimotoCombo by default or whichever score type was chosen in the Options dialog). A progress bar at the bottom of the screen indicates how far the run has proceeded.

This run requires around 40 seconds.

Visualize and save results

When the run is complete the query will display in the 3D window and the 20 top scoring hits, based on the score chosen in the Score by field during the run set-up (default is TanimotoCombo), are listed in the results spreadsheet. This is intended as a quick summary view. To examine all the hits then the full hitlist, named rocs_hits_1.oeb.gz (default filename) is automatically saved in the working directory for viewing with tools such as VIDA. The query is also listed as an aid to comparison. Make the query visible by clicking in the visibility column (green circle) next to its entry. Click on a molecule name in the spreadsheet to display its best overlay with the query and scroll through the molecules to view each one.

Scores for all the available scoring functions are shown in the spreadsheet. Clicking on any one of the column headings will sort by that score and clicking multiple times will switch between ascending and descending. The 20 hits listed in the spreadsheet will change to reflect the sort preference (score and ascending (bottom 20) or descending (top 20 hits)).

With descending TanimotoCombo selected as the sort order look at the TanimotoCombo scores for the top scoring ligand. One of the Suprofen conformer has a TanimotoCombo score of approximately 1.0. (Because Random starts were used your score might be slightly different.) The authors found that 0.99 was a reasonable threshold for a “reliability” cutoff for alignments that reproduced known binding poses in CYP2C9. This molecule satisfies that requirement.

Conclusions

The query molecule has a relatively high shape similarity to a Suprofen conformer. The Shape Tanimoto score of that conformer is one of the highest for the entire dataset. However, low color similarity for the Suprofen conformer results in an overall TanimotoCombo score around 1.0. Given the low TanimotoCombo score to the known CYP2C9 ligands one might conclude that this molecule has a low risk of being metabolized by CYP2C9 and is worthy of further investigation.

This concludes the tutorial “Perform a simple ROCS run” which guides the user through the basics of running ROCS through the vROCS interface and visualizing the results.