Search for Similar Binding Sites with SiteHopper

SiteHopper is a tool to superpose and compare protein binding sites. The input is a protein-ligand (or apo) complex where the binding site is converted to a searchable site called a patch. SiteHopper can efficiently compare this patch to other binding site patches, which are stored in a pregenerated database.

A search for similar binding sites can be useful for a number of different reasons, two of which are:

  • A similar binding site in a related or unrelated protein can indicate possible off-target risks. Finding and comparing these binding sites can be helpful for improving compound selectivity.

  • A ligand from a similar binding site can potentially be used as a tool compound or as an early stage hit for a target. Especially interesting are results where the binding site similarity is high while the ligand similarity is low - think of this as scaffold hopping based on binding site similarity.

In this tutorial, a dataset of binding site patches will be searched with a query structure from the CDK2 target using the SiteHopper Search Floe. The results will be visualized in both the Analyze page and the 3D viewer. The patch database preparation should only be run once and is afterwards shareable with others, and we have shipped a couple of large searchable databases as well.

This tutorial uses the following Floes

Create a Tutorial Project


If you have already created a tutorial project, you can reuse the existing one.

Log into Orion and click the home button at the top of the blue ribbon on the left of the Orion Interface. Then click on the ‘Create New Project’ button and in the pop up window enter Tutorial for the name of the project and click ‘Save’.


Orion home page

Prepare Patch Dataset


If you have already created a patch collection, you can reuse it for search. OpenEye has pre-generated the Guide to Pharmacology prepared design unit collection and converted it to a SiteHopper searchable collection. It contains patches of prepared design units for ~21,000 PDB structures for targets in the Guide to Pharmacology database; with biological unit expansion and alternate conformation enumeration that turns into ~39,000 prepared design units. A second collection of potential sites has also been built that is around ~300,000 patches.

OEDesignUnits must be converted to SiteHopper patches before they can be searched with the SiteHopper Search Floe. This floe accepts design units prepared using the Spruce Floes.

Locate the Make SiteHopper Patch Database Floe as follows:

  1. Click on the ‘Floes’ button in the left menu bar

  2. Click on the ‘Floes’ tab

  3. Set the ‘Browse Workfloes’ drop down menu to ‘Show all packages’

  4. Select ‘All’ under Browse Workfloes

  5. In the search bar enter Make SiteHopper Patch Database

The Make SiteHopper Patch Database floe will be visible to the right (see below)


Click on the Make SiteHopper Patch Database Floe to bring up the Job Form and set the following parameters.

  1. Inputs -> Input Dataset : Spruce_prep_dataset_CDK2.

    To select this collection:

    • Click the ‘Choose Input Button’.

    • Click on ‘<Organization> Data’ under collections in the Select Input Collection modal. <Organization> is the name of your company/entity/organization. This name is set during Orion installation or configuration; however, an Organization admin can also change the name.

    • Select the Spruce Prep Dataset CDK2 dataset.

    • Click ‘Use Dataset as Input’.

  2. Outputs

    • Type or enter CDK2 Patch DB Collection in the Collection Name output field.

    • Type or enter Failed Patch Dataset in the Failed Output Dataset input field.


Scroll down to the bottom of the Job Form and click ‘Start Job’. The job will take about 5 minutes to run. Once the Floe has finished, move on to the next step of the tutorial.


Switching Enumerate Potential Pockets on creates a SiteHopper Database collection, where liganded sites are skipped, but other potential binding sites are enumerated using pocket finding tools.

Prepare Query Protein

For this tutorial, use the CDK2 crystal structure 5K4J from the protein data bank. To import this structure into Orion and prepare it for docking, locate the SPRUCE - Protein Preparation in the ‘Floes’ page as follows.

There is a tutorial detailing using that Floe here: Spruce Prep Tutorial.

This dataset can also be used as a reference against a list of other PDB codes, which is how the demo dataset Spruce_prep_dataset_CDK2 was created.

View Prepared Site

Once the Spruce - Protein Preparation job finishes, make the resulting dataset active as follows:

  1. Go to the Project Data page by clicking on the blue ‘Data’ button on the left menu bar.

  2. Select ‘Organization Data’ from the list of options to the left of the page.

  3. Set the ‘Type’ drop down menu to Datasets if it is not already.

  4. Check to make sure you have no datasets set as active by clicking on ‘Active Datasets’ in the top right of the window.

  5. Locate the dataset named Spruce_prep_dataset_5K4J and make it active by clicking on the circle with the plus symbol in the Active column next to the Spruce_prep_dataset_5K4J.


Now you can see the query in the 3D viewer by clicking on the 3D button in the left menu bar.


Search Patch Dataset

Here we will search smaller CDK2 collection generated above, but there are also Guide to Pharmacology prepared design unit patch databases available for searching in the Organization data.

Locate the SiteHopper Search Floe as follows:

  1. Click on the ‘Floes’ button on the left menu bar.

  2. Click on the ‘Floes’ tab.

  3. Set the ‘Browse Workfloes’ drop-down menu to ‘Show all packages’.

  4. Select ‘All’ under Browse Workfloes.

  5. In the search bar enter SiteHopper Search.

The SiteHopper Search Floe will be visible to the right (see below).


Click on the SiteHopper Search Floe to bring up the Job Form and set the following parameters.

  1. Inputs -> Query Dataset : Spruce_prep_dataset_5K4J

    This is the dataset with the protein we prepared.

  2. Inputs -> SiteHopper DB Dataset: : CDK2 Patch DB Collection

    This is the dataset with all the patches we created at the beginning of this tutorial.

  3. Outputs -> Output Dataset: : SiteHopper CDK2 Search Hits

  4. The option to use GPU-prescreening may be effective when searching very large patch databases, but for smaller databases is less useful.

  5. When searching larger patch databases, it can be helpful to remove results with too high a sequence similarity, depending on the objective of the search.

Once the parameters are set, scroll to the bottom of the page and click ‘Start Job’. The job will take 5-10 minutes depending on the query, and potentially a lot longer if the databases being searched are large.


View Results

Once the search job has completed, make the output datasets active as follows:

  1. Press the ‘Project Data’ button in left menu bar.

  2. Clear current active datasets by clicking on ‘Active Datasets’ and clicking ‘Clear All’ in the menu that pops up.

  3. Make the output datasets active by clicking on the + icon in the Actives column.

Once the datasets are active, switch to the 3D viewer by clicking the 3D button in the menu bar. The 3D results will be overlaid to the input query, which is the first record in the resulting dataset. Here one of the top results is shown with the query. This is similar to how FastROCS results are shown for small molecules.


Analysis of the scores and drilling down to interesting results can also be done in the Analyze page based on SiteHopper scores and sequence similarity scores. Here, patch scores are plotted against sequence similarity. Of course, for a dataset of the same target, the sequence similarities are very high and perfect for most of the dataset. As larger databases are searched, this property becomes more interesting to explore.