• Docs »
  • OEDocking Tutorials

OEDocking Tutorials

Overview

The OEDocking suite contains three docking programs, FRED, HYBRID and POSIT, and associated utilities. The input to FRED, HYBRID or POSIT is one (or more) crystallographic structures of the target protein (possibly including the co-crystallized ligand) and one or more drug-like molecules to be docked. The output is the docked structure of the molecules and information about the score or confidence in the docked structure.

At the end of these tutorials, you will be able to run:

  • PDB2RECEPTOR to generate OEDocking receptors with bound ligands.
  • COMBINE_RECEPTORS to take disparate but sequence related proteins and combine the complexes into new receptors with more thorough binding information.
  • FRED to dock multi-conformer molecules into the structure of a target protein and score the molecules.
  • HYBRID to dock multi-conformer molecules into the structure(s) of a target protein and the structure of a bound ligand and score the molecules.
  • POSIT to generate potential poses of ligands against OEDocking receptor targets.
  • DOCKING_REPORT to generate a PDF report for one or more molecules docked by FRED , HYBRID or POSIT.

The basic workflow for docking ligands using FRED, HYBRID or POSIT is as follows:

  • Generate OEDocking receptors with bound ligands.
  • Dock ligands into receptor.
  • Analyze results.

Data files for these tutorials are located in the directory OPENEYE_DIR/data/oedocking where OPENEYE_DIR refers to the top level OpenEye installation directory. A versioned OEDOCKING directory in C:\Program Files (x86)\ for the 32 bit installation or C:\Program Files\ for the 64 bit installation, is the default location on Windows. The data and documentation directories are easily accessible in OS X distributions as standalone folders in the package.

Note

For the following tutorials, the OpenEye application banner and run settings have been omitted for brevity.

PDB2RECEPTOR tutorial

The first step in using FRED, HYBRID or POSIT is the creation of an OEDocking receptor, a collection of docking related information connected to a protein. A typical receptor (as viewed in VIDA) is shown in figure OEDocking receptor and typical docking information.

OEDocking receptor

OEDocking receptor and typical docking information

An OEDocking receptor includes the protein structure and a description of the binding site. This description includes a so called outer contour that indicates where heavy atoms are to be placed during FRED, HYBRID or POSIT ’s search procedures.

A pose receptor traditionally includes a bound ligand used to help identify existing binding modes, and may include so called extra molecules which are interesting items such as waters and solvents that have been stripped for the purposes of docking.

While there are several ways to make receptors, POSIT has a simplified, easy to use command line utility, PDB2RECEPTOR.

The most complicated portion of making pose receptors is identifying the bound ligands. If PDB2RECEPTOR is run with only an input protein file, it outputs a list of potential ligands and halts unless only one valid ligand is detected in which case a receptor is created with that ligand:

> pdb2receptor –pdb 2IKO.pdb.gz

Executing this command will create a receptor with the one detected ligand (with residue 7IG601A). The same receptor could also have been created by explicity listing the residue on the command lne.

> pdb2receptor –pdb renin/2IKO.pdb.gz -ligand_residue 7IG601A -receptor 2IKO_receptor.oeb.gz

This command creates the following files:

  • 2IKO_receptor.oeb.gz - the OEDocking receptor
  • 2IKO_settings.param - the parameters used to make the receptor

See the PDB2RECEPTOR section for more details, including how to more explicitly control output file names.

COMBINE_RECEPTORS tutorial

Combining receptors is an optional step that can be used to take information found in multiple receptors to create a new, merged, receptor.

COMBINE_RECEPTORS is intended to be used in an automated fashion where the output aligned receptors should be included with the original receptors when running POSIT. As such, the output receptor files are automatically named based on the input filenames and only receptors that are capable of merging produce output.

This is particularly helpful when using small bound fragments to try and predict binding poses for larger ligands. Combining receptors works by supplying a list of potential receptors and only receptors that are deemed worthy or merging are output. To be worthy of merging:

  1. It must be possible to align the protein sequences.
  2. The ligands must overlap, but be different enough to indicate that the combined receptor has more information content than each independently.
Merged receptors

Merged Receptors: Merging the receptors 2IKO and 2IKU capture more potential interaction constraints than either does alone

Merged receptors

Ligand Posed to Merged Receptors: REN9 is correctly predicted by the merged receptor

For example, consider the merged receptor shown in figure: Merged Receptors. In this case, one of the ligands binds to a pocket that is not present in the other. The expectation is that the combination of both ligands has more information content than either alone.

For each valid merge, two output files are generated, one in each reference frame.

> combine_receptors -receptors *.rec.oeb.gz

Note

Any time a collection of receptors is used on the command line, it can be replaced with a .lst file containing the filenames. For example:

> combine_receptors -receptors receptors.lst

This is particularly helpful when trying to analyze a large amount of proteins that are too large for the command line. On UNIX based systems, an easy way to generate this file from a long list of receptors is to use the find command as follows:

> find . -name \*rec\*oeb.gz -exec printf {} \; > receptors.lst

See the COMBINE_RECEPTORS usage section for more details.

FRED tutorial

FRED requires a protein to be docked into, a definition of a region in that protein in which the docking will take place and a multi-conformer database of molecules to be docked. The most common format for database file(s) is a multi-conformer OEBinary file created by OpenEye’s OMEGA application, however, this file can be one of several 3D formats. These formats include SDF, MOL2 and PDB. FRED determines the database file format from the file extension, .sdf or .mol for SDF, .mol2 for MOL2, .pdb or .ent for PDB. Gzip compressed files of these same formats are allowed as well. FRED will interpret infile.sdf.gz as a gzip’ed SDF file.

Note

Even though all these formats are supported, using SDF, PDB or MOL2 can result in a loss of speed due to the I/O penalty of these formats.

FRED requires a single receptor to dock ligands into. Receptors may be created with the following programs: MAKE_RECEPTOR, PDB2RECEPTOR , APOPDB2RECEPTOR , and RECEPTOR_SETUP .

Note

To encapsulate all of the receptor information, the receptor molecule file must be in .oeb or .oeb.gz format.

To dock a set of ligands into the receptor 2IKO_receptor.oeb.gz:

> fred -receptor 2IKO_receptor.oeb.gz -dbase all.oeb.gz

By default, FRED generates several output files. Unless you specify a different Output prefix during setup, FRED will use the prefix fred for all output files.

  • fred_docked.oeb.gz - top 500 scoring molecules of all.oeb.gz docked into 2IKO_receptor.oeb.gz.
  • fred_undocked.oeb.gz - molecules of all.oeb.gz that could not be docked into the active site (generally occurs if the molecules are too big for the site). This file will not be present if all molecules were successfully docked to the active site.
  • fred_score.txt - a tab separated text file containing the name and score of each of the top 500 ligands.
  • fred_report.txt - a text report of the docking process.
  • fred_settings.param - a text file containing the parameters used for this run.
  • fred_status.txt - a text file that is written periodically during the run with the status of the run.

If you have a multi-processer machine, you can harness the extra computing power by using OpenMPI. To re-run the job on a host with 4 processers, you enter the following:

> fred -mpi_np 4 -receptor 2IKO_receptor.oeb.gz -dbase all.oeb.gz -prefix 2IKO_fred

The output from the docking runs may be viewed in VIDA, as depicted in figure 2IKO FRED Docked.

_images/2IKO_fred_docked_screenshot.png

2IKO FRED Docked results showing the reference ligand (green) and the top docked ligand

In addition to analyzing and visualizing the output from a docking run in VIDA, a summary PDF report can be generated using DOCKING_REPORT.

> docking_report -docked_poses 2IKO_fred_docked.oeb.gz \
                    -receptor 2IKO_receptor.oeb.gz  -report_file 2IKO_fred_docked_report.pdf

The result of running the above command is depicted in figure 2IKO FRED Docking Report.

_images/2IKO_fred_docked_report.png

2IKO FRED Docking Report

The PDF report includes a 2D depiction of each molecule, a breakdown of the docking score components by atom, a comparison of the molecule’s score compared to the other docked molecules, and a ‘Residue Fingerprint’ which highlights which residues in the receptor site the ligand is interacting with. Greyed out residues are residues in the site that other ligands interact with, but the current ligand does not.

HYBRID tutorial

HYBRID docks molecules using a single receptor or using multiple structures of the target protein. When using a single structure, the input files are simply the receptor and the ligand database. However, when using multiple structures of the target protein, the input files are all of the receptor files and the ligand database.

  • receptor1.oeb.gz - a receptor file containing the structure of the target protein and a bound ligand.
  • receptor2.oeb.gz - a receptor file containing the structure of the second target protein and a bound ligand. This receptor file should have a different structure of the same target protein in receptor1.oeb.gz, generally with a different bound ligand.
  • multiconformer_ligands.oeb.gz - conformationally expanded 3D ligands to dock.

Setting up and running a HYBRID job is exactly like setting up and running a FRED job. To dock a set of ligands into the receptor 2IKO_receptor.oeb.gz:

> hybrid -receptor 2IKO_receptor.oeb.gz -dbase all.oeb.gz

By default, HYBRID generates several output files. Unless you specify a different Output prefix during setup, HYBRID will use the prefix hybrid for all output files.

  • hybrid_docked.oeb.gz - top 500 scoring molecules of all.oeb.gz docked into 2IKO_receptor.oeb.gz.
  • hybrid_undocked.oeb.gz - molecules of all.oeb.gz that could not be docked into the active site (generally occurs if the molecules are too big for the site). This file will not be present if all molecules were successfully docked to the active site.
  • hybrid_score.txt - a tab separated text file containing the name and score of each of the top 500 ligands.
  • hybrid_report.txt - a text report of the docking process.
  • hybrid_settings.param - a text file containing the parameters used for this run.
  • hybrid_status.txt - a text file that is written periodically during the run with the status of the run.

If you have a multi-processer machine, you can use the OpenMPI option.

> hybrid -mpi_np 4 -receptor 2IKO_receptor.oeb.gz -dbase all.oeb.gz -prefix 2IKO_hybrid

As with the output from FRED docking runs, HYBRID results may be viewed and analyzed in VIDA as shown in figure 2IKO HYBRID DOCKED.

_images/2IKO_hybrid_docked_screenshot.png

2IKO HYBRID DOCKED results showing the reference ligand (green) and the top docked ligand

Likewise, you also can create a summary PDF report for the HYBRID results using DOCKING_REPORT.

POSIT tutorial

Given receptors, using POSIT is very straightforward. There are two basic ways to input molecules to POSIT.

  • -in - converts input to 3D conformers ( if 3D structures are input, these initial structures are retained )
  • -dbase - takes the input conformations as is (these are normally generated with OMEGA.

For usage of -dbase see POSIT MPI Tutorial.

Given a set of input smiles strings:

> posit -receptor renin/receptors/*.oeb.gz -in renin/all.smi

Note

On Microsoft Windows systems, you need to expand the wildcard:

> posit -receptor renin\receptors\2IL2_b.rec.oeb.gz renin\receptors\2IL2_a.rec.oeb.gz \
      renin\receptors\2IL2_c.rec.oeb.gz renin\receptors\2IKO.rec.oeb.gz \
      renin\receptors\2IKU_a.rec.oeb.gz renin\receptors\2IKU_b.rec.oeb.gz -in renin\all.smi

The following files are output:

  • posit_docked.oeb.gz - contains all successful poses
  • posit_score.txt - contains the scores of all successful poses
  • posit_report.txt - contains the report of the run
  • posit_status.txt - a periodic status file generated during a run
  • posit_settings.param - parameters used in the run

The following files are output only if non-empty:

  • posit_clashed.oeb.gz - contains all poses with good enough probability but clash
  • posit_undocked.oeb.gz - contains all unsuccessful poses

There is more than one reason a pose may be unsuccessful. The most common is that the probability of the predicted binding mode is too low.

To specify the -prefix option to add a prefix to all files output by POSIT or use the -docked_molecule_file option to output a pose file with particular name.

When POSIT is finished, it prints the final status and indicates what new data was added to the results that are output:

> posit -receptor renin/receptors/*.oeb.gz -in renin/all.smi \
   -prefix renin
Sorting by input order
--------Finished docking--------
Run time : 10m 40s (640.6seconds total)
Time per molecule 58.24sec

Molecules read : 11
Molecules processed : 11
Molecules successfully docked : 6
Unsuccessful dockings : 5

  Dock Statistics                                     Count
  ----------------------                              -----
  Successfully Docked                                 6
  Clashed with protein                                5

Docked molecules outputted to renin_docked.oeb.gz
Docked (but clashing) molecules outputted to renin_clashed.oeb.gz
Failed molecules written to: renin_undocked.oeb.gz
Failed molecules log written to: renin_rejected.txt

The following data is attached to SD data of each ligand
  "POSIT::Probability" : docked score (probability of correct pose)
  "POSIT receptor filename" : filename of the receptor the ligand was docked into
  "POSIT receptor title" : title of the receptor the ligand was docked into
  "POSIT::Method" : docking method selected by POSIT
  "Result" : description of the expected result quality (GREAT/GOOD/MEDIOCRE/POOR)

Scores were also outputted to text file : renin_score.txt
POSIT report was saved to file : renin_report.txt
Finished

The following files are output by the command above:

  • renin_docked.oeb.gz - the successfully docked structures
  • renin_clashed.oeb.gz - clashing poses with good probability
  • renin_undocked.oeb.gz - all non docked structures
  • renin_score.txt - scores of docked structures
  • renin_rejected.txt - list of rejected structures and status of rejection
  • renin_report.txt - report as seen above
  • renin_status.txt - current status of run, number of molecules processed and so on.
  • renin_settings.param - parameter file used for run

The score file contains the scores and ranking of docked structures (some columns have been removed for brevity):

Title POSIT::Probability POSIT receptor filename POSIT::Method Result
ren1 0.950000 2IKO.rec.oeb.gz SHAPEFIT GREAT
ren2 0.850000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren3 0.890000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren5 0.790000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren7a 0.790000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren10 0.850000 2IL2_c.rec.oeb.gz SHAPEFIT GREAT

The rejected file can be used to identify the status of rejected molecules, for instance “All conformers clashed with protein” indicates that while the probability was good, the protein could not accept the desired pose:

Ligand # Title Status
7 ren8b All conformers clashed with protein
3 ren4 All conformers clashed with protein
10 ren11 All conformers clashed with protein
5 ren6 All conformers clashed with protein
8 ren9 All conformers clashed with protein

Note

While POSIT can take most molecule formats as input, with large datasets it is fastest to use a pre-generated database of OMEGA [Hawkins-2010] generated conformers. It is recommended, above two rotatable bonds, to generate 100 conformers per rotatable bond when running OMEGA:

> omega2 -in renin/all.smi -out all.oeb.gz -rangeIncrement 1 \
   -maxConfRange 200,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600
> posit -receptor renin/receptors/*.oeb.gz -dbase all.oeb.gz \
   -prefix renin

See the posit usage section for more details.

POSIT MPI tutorial

Running POSIT on multiple cores is a simple matter of adding the -mpi_np argument and specifying the number of cores desired. When POSIT is run on a small job as shown above (with 11 molecules and 6) receptors, using a large number of cores is overkill.

> posit -mpi_np 3 -receptor renin/receptors/*.oeb.gz -dbase all.oeb.gz \
     -prefix renin
POSIT running under MPI for a small number of molecules.

POSIT performance varying the number of cores against a small lead-optimization example.

As seen in figure Posit Performance, running with 3 cores gives a large boost in the run-time and adding another is only marginally faster. Note that running under two cores is not recommended as one core is always the master so, in effect, this is the slowest way to run POSIT.

Also note that using OMEGA conformations as input is the fastest way to run POSIT.