Tutorial

Introduction

This is a guided tour of each program used in OEDocking’s POSIT. At the end of this tutorial, the user should be comfortable running:

  • make_pose_receptor - to generate OEDocking receptors with bound ligands.
  • combine_receptors - to take disparate but sequence related proteins and combine the complexes into new receptors with more thorough binding information.
  • posit - to generate potential poses of ligands against OEDocking receptor targets.
POSIT Application Suite Workflow

POSIT workflow from receptor creation to posing

The basic workflow for running POSIT is seen in figure Posit workflow from receptor creation to posing.

This tutorial shows how to use POSIT in several common situations. Other cases, are described in the application usage sections (posit usage, make_pose_receptor usage, combine_receptors usage).

Data files for these tutorials are located in the directory OPENEYE_DIR/data/posit where OPENEYE_DIR refers to the top level OpenEye installation directory. A versioned POSIT directory in C:\Program Files (x86)\ for the 32 bit installation or C:\Program Files\ for the 64 bit installation, is the default location on Windows. The data and documentation directories are easily accessible in OSX distributions as standalone folders in the package.

Note

For the following tutorials, the OpenEye application banner and run settings have been omitted for brevity.

make_pose_receptor tutorial

The first step in using POSIT is the creation of an OEDocking receptor, a collection of docking related information connected to a protein. A typical receptor (as viewed in VIDA) is shown in figure OEDocking receptor and typical docking information.

OEDocking receptor

OEDocking receptor and typical docking information

An OEDocking receptor includes the protein structure and a description of the binding site. This description includes a so called outer contour that indicates where heavy atoms are to be placed during HYBRID and FRED‘s search procedures.

A pose receptor traditionally includes a bound ligand used to help identify existing binding modes, and may include so called extra molecules which are interesting items such as waters and solvents that have been stripped for the purposes of docking.

OEDocking receptors may also include constraints. However POSIT (unlike HYBRID or FRED) will currently ignore them when the ShapeFit method is chosen.

While there are several ways to make receptors, POSIT has a simplified, easy to use command line utility, make_pose_receptor.

The most complicated portion of making pose receptors is identifying the bound ligands. If make_pose_receptor is run with only an input protein file, it outputs a list of potential ligands and halts:

> make_pose_receptor –prot 2IKO.pdb.gz
Found ligands:
1. CCc1c(c(nc(n1)N)N)c2ccc(cc2)NCc3cc(cc(c3)F)F
  To extract ligand, add command line switch:
    -residue 7IG601A or -residue 1

make_pose_receptor helpfully indicates the command line switches to use to output the given ligand. For example, the command line switch suggested above, creates the desired receptor:

> make_pose_receptor –prot renin/2IKO.pdb.gz -residue 1 -prefix 2IKO
Protein (RENIN7IG601A): Writing to receptor 2IKO_receptor.oeb.gz

This command creates the following files:

  • 2IKO_receptor.oeb.gz - the OEDocking receptor
  • 2IKO_settings.param - parameters used to make the receptor

Alternatively, if the protein and ligand have already been split, both -prot and -ligand can be specified on the command line:

> make_pose_receptor -prot renin/2IKO_prot.pdb.gz \
  -ligand renin/2IKO_lig.pdb.gz -prefix 2IKO
Protein (RENIN7IG601A): Writing ligand 1 to receptor 2IKO_receptor.oeb.gz

Finally, the -auto switch can be used to output every ligand-like non-covalent molecule detected in the input file. Note that this detection is fairly ad-hoc, but works in a pinch. The output filename is annotated with the automatically detected bounds ligands residue as seen below:

> make_pose_receptor –prot renin/2IKO.pdb.gz -auto -prefix 2IKO
Protein (RENIN7IG601A): Writing to receptor 2IKO_receptor_7IG601A.oeb.gz

See the make pose receptor usage section for more details, including how to more explicitly control output file names.

combine_receptors tutorial

Combining receptors is an optional step that can be used to take information found in multiple receptors to create a new, merged, receptor.

combine_receptors is intended to be used in an automated fashion where the output aligned receptors should be included with the original receptors when running posit. As such, the output receptor files are automatically named based on the input filenames and only receptors that are capable of merging produce output.

This is particularly helpful when using small bound fragments to try and predict binding poses for larger ligands. Combining receptors works by supplying a list of potential receptors and only receptors that are deemed worthy or merging are output. To be worthy of merging:

  1. It must be possible to align the protein sequences.
  2. The ligands must overlap, but be different enough to indicate that the combined receptor has more information content than each independently.
Merged receptors

Merged Receptors: Merging the receptors 2IKO and 2IKU capture more potential interaction constraints than either does alone

Merged receptors

Ligand Posed to Merged Receptors: REN9 is correctly predicted by the merged receptor

For example, consider the merged receptor shown in figure: Merged Receptors. In this case, one of the ligands binds to a pocket that is not present in the other. The expectation is that the combination of both ligands has more information content than either alone.

For each valid merge, two output files are generated, one in each reference frame.

> combine_receptors -receptors *.rec.oeb.gz

Note

Any time a collection of receptors is used on the command line, it can be replaced with a .lst file containing the filenames. For example:

> combine_receptors -receptors receptors.lst

This is particularly helpful when trying to analyze a large amount of proteins that are too large for the command line. On UNIX based systems, an easy way to generate this file from a long list of receptors is to use the find command as follows:

> find . -name \*rec\*oeb.gz -exec printf {} \; > receptors.lst

See the combine_receptors usage section for more details.

POSIT tutorial

Given receptors, using posit is very straightforward. There are two basic ways to input molecules to posit.

  • -in - converts input to 3D conformers ( if 3D structures are input, these initial structures are retained )
  • -dbase - takes the input conformations as is (these are normally generated with OMEGA.

For usage of -dbase see POSIT MPI Tutorial.

Given a set of input smiles strings:

> posit -receptor renin/receptors/*.oeb.gz -in renin/all.smi

The following files are output:

  • posit_docked.oeb.gz - contains all successful poses
  • posit_score.txt - contains the scores of all successful poses
  • posit_report.txt - contains the report of the run
  • posit_status.txt - a periodic status file generated during a run
  • posit_settings.param - parameters used in the run

The following files are output only if non-empty:

  • posit_clashed.oeb.gz - contains all poses with good enough probability but clash
  • posit_undocked.oeb.gz - contains all unsuccessful poses

There is more than one reason a pose may be unsuccessful. The most common is that the probability of the predicted binding mode is too low.

To specify the -prefix option to add a prefix to all files output by posit or use the -docked_molecule_file option to output a pose file with particular name.

When POSIT is finished, it prints the final status and indicates what new data was added to the results that are output:

> posit -receptor renin/receptors/*.oeb.gz -in renin/all.smi \
   -prefix renin
Sorting by input order
--------Finished docking--------
Run time : 10m 40s (640.6seconds total)
Time per molecule 58.24sec

Molecules read : 11
Molecules processed : 11
Molecules successfully docked : 6
Unsuccessful dockings : 5

  Dock Statistics                                     Count
  ----------------------                              -----
  Successfully Docked                                 6
  Clashed with protein                                5

Docked molecules outputted to renin_docked.oeb.gz
Docked (but clashing) molecules outputted to renin_clashed.oeb.gz
Failed molecules written to: renin_undocked.oeb.gz
Failed molecules log written to: renin_rejected.txt

The following data is attached to SD data of each ligand
  "POSIT::Probability" : docked score (probability of correct pose)
  "POSIT receptor filename" : filename of the receptor the ligand was docked into
  "POSIT receptor title" : title of the receptor the ligand was docked into
  "POSIT::Method" : docking method selected by POSIT
  "Result" : description of the expected result quality (GREAT/GOOD/MEDIOCRE/POOR)

Scores were also outputted to text file : renin_score.txt
POSIT report was saved to file : renin_report.txt
Finished

The following files are output by the command above:

  • renin_docked.oeb.gz - the successfully docked structures
  • renin_clashed.oeb.gz - clashing poses with good probability
  • renin_undocked.oeb.gz - all non docked structures
  • renin_score.txt - scores of docked structures
  • renin_rejected.txt - list of rejected structures and status of rejection
  • renin_report.txt - report as seen above
  • renin_status.txt - current status of run, number of molecules processed and so on.
  • renin_settings.param - parameter file used for run

The score file contains the scores and ranking of docked structures (some columns have been removed for brevity):

Title POSIT::Probability POSIT receptor filename POSIT::Method Result
ren1 0.950000 2IKO.rec.oeb.gz SHAPEFIT GREAT
ren2 0.850000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren3 0.890000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren5 0.790000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren7a 0.790000 2IKU_b.rec.oeb.gz SHAPEFIT GREAT
ren10 0.850000 2IL2_c.rec.oeb.gz SHAPEFIT GREAT

The rejected file can be used to identify the status of rejected molecules, for instance “All conformers clashed with protein” indicates that while the probability was good, the protein could not accept the desired pose:

Ligand # Title Status
7 ren8b All conformers clashed with protein
3 ren4 All conformers clashed with protein
10 ren11 All conformers clashed with protein
5 ren6 All conformers clashed with protein
8 ren9 All conformers clashed with protein

Note

While posit can take most molecule formats as input, with large datasets it is fastest to use a pre-generated database of OMEGA [Hawkins-2010] generated conformers. It is recommended, above two rotatable bonds, to generate 100 conformers per rotatable bond when running OMEGA:

> omega2 -in renin/all.smi -out all.oeb.gz -rangeIncrement 1 \
   -maxConfRange 200,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600
> posit -receptor renin/receptors/*.oeb.gz -dbase all.oeb.gz \
   -prefix renin

See the posit usage section for more details.

POSIT MPI tutorial

Running posit on multiple cores is a simple matter of adding the -mpi_np argument and specifying the number of cores desired. When posit is run on a small job as shown above (with 11 molecules and 6) receptors, using a large number of cores is overkill.

> posit -mpi_np 3 -receptor renin/receptors/*.oeb.gz -dbase all.oeb.gz \
     -prefix renin
POSIT running under MPI for a small number of molecules.

POSIT performance varying the number of cores against a small lead-optimization example.

As seen in figure Posit Performance, running with 3 cores gives a large boost in the run-time and adding another is only marginally faster. Note that running under two cores is not recommended as one core is always the master so, in effect, this is the slowest way to run posit.

Also note that using OMEGA conformations as input is the fastest way to run POSIT.

Table Of Contents

Previous topic

Introduction

Next topic

Clashes