HYBRID

Overview

HYBRID is a docking program that also uses elements of ligand based design to enhance performance. Typically, the protein structure is determined with X-ray crystallography in the presence of a known binding ligand (or bound ligand). The HYBRID program uses the information present in both the structure of the protein and the bound ligand to enhance docking performance. HYBRID requires that the structure of a bound ligand be known, if it is not known FRED can then be used to do traditional docking.

HYBRID also allows multiple structures/conformations of the target protein to be used. In this case HYBRID will determined the best structure/conformation to use for each ligand in the docking database.

See also

hybrid theory section.

Input Preparation

Ligand Preparation

The most common use of HYBRID is to dock a large collection of molecules into the active site of a target protein. For the purposes of this document, we’ll call the file(s) of molecules the database file(s), or dbase file(s). The most common format for database file(s) is a multi-conformer OEBinary file created by OpenEye’s OMEGA program, however, this file can be one of several 3D formats. These formats include SDF, MOL2 and PDB. HYBRID determines the database file format from the file extension, .sdf or .mol for SDF, .mol2 for MOL2, .pdb or .ent for PDB. Gzip compressed files of these same formats are allowed as well. HYBRID will interpret infile.sdf.gz as a gzip’ed SDF file.

Note

Note that even though all these formats are supported, using SDF, PDB or MOL2 can result in a loss of speed due to the I/O penalty of these formats.

HYBRID has no provision for conversion of 1D/2D molecules to 3D. The database file(s) must be in a conformationally expanded 3D format. Within the OpenEye tool chain the program OMEGA can be used to convert 1D/2D to 3D and generate conformers.

By default HYBRID will interpret conformers in the database file(s) as part of a single multi-conformer molecule as long as they:

  • Are contiguous in the input file.
  • Have the same numbers of atoms and bonds in the same order
  • Have identical atom and bond properties with their order correspondent in the subsequent connection table
  • Have the same atom and bond stereochemistry

While this may appear to be a restrictive list, many programs write multi-conformer molecules into SDF or MOL2 files such that the above rules will be satisfied. If the conformers are named differently, (i.e. they have a conformer number appended to the base name like acetsali_1, acetsali_2), HYBRID will still consider them part of a single multi-conformer molecule if the criteria above are met. For file formats that are not inherently multi-conformer, this behavior can be turned off or modified with the -conftest command-line switch.

Receptor Preparation

HYBRID can use either a single receptor or multiple receptors each of which contains a different structure/conformation of the target protein. Each receptor must also have a bound ligand. Receptors with bound ligands can be created with the following programs.

Program Type Description
make_receptor GUI Interactive GUI for creating a receptor.
pdb2receptor Command Line Creates a receptor from a PDB file with a
protein-ligand complex.    
receptor_setup Command Line Creates a receptor from a molecule file with a protein and a separate file with either the structure of a bound ligand or a box enclosing the active site.

Note

Receptors can also be created using the Docking Toolkit (see the Docking Toolkit documentation).

Parameter List

Input

-receptor : Receptor file(s).

-dbase : Multiconformer molecules to dock.

-conftest : Conformer test.

-molnames : Molecule names file.

-param : Parameter file.

Dock Options

-dock_resolution : Docking resolution.

Output Files

-docked_molecule_file : Docked molecule file.

-undocked_molecule_file : Undocked molecule file.

-score_file : Text score file.

-report_file : Text docking report file.

-settings_file : Text parameter settings file.

-status_file : Text status file.

Output Options

-hitlist_size : Number of docked molecules to keep.

-num_poses : Number of poses per molecule to retain.

-annotate_scores : Add VIDA score annotations to docked molecules.

-save_component_scores : Save score components.

-no_extra_output_files : Suppress default extra output files.

-no_dots : Suppress default writing of dots.

-prefix : Prefixes default output files with this string.

Note

Values in parenthesis are defaults.

Parameter Details

Input

-receptor <receptor file1> [<receptor file2> ...] [No Default: Required Parameter]

Receptor file(s) to dock to. Each receptor must have a bound ligand.

If multiple receptors are specified each docking ligand will be docked into the single receptor with the bound ligand most similar to it, as measured by 3D shape and chemical similarity.

[ Aliases = -rec ]

-dbase <input filename1> [<input filename2> ...] [No Default: Required Parameter]

File(s) containing conformationally expanded ligands to dock (see section Ligand Preparation).

The following file formats are supported.

File type Extension
OEBinary .oeb .oeb.gz
SDF .sdf .mol .sdf.gz .mol.gz
MOL2 .mol2 .mol2.gz
PDB .pdb .ent .pdb.gz .ent.gz
MacroModel .mmod .mmod.gz

More than one file can be specified.

[ Aliases = -database, -in ]

-param <parameter filename> [No Default]

A parameter file is a text file that lists parameter settings to be used during a run. If a parameter is specified both on the command line and in the parameter file, the value specified on the command line is used.

The format of the parameter file is as follows:

  • One parameter per line
  • For non-list parameters one key-value pair per line. (e.g., -receptor rec.oeb.gz).
  • For list parameters a key followed by all the values (e.g., -dbase lig1.oeb.gz ligs2.oeb.gz)
  • Boolean parameters must be listed as a key followed by true or false (e.g. -annotate_poses true).
  • The parameter file may not contain the -param parameter.
  • Lines begining with # are considered comments
-conftest <test type> [Default: isomeric]

Note

This flag has no effect when the database format is OEBinary

When non-OEBinary database file(s) (see parameter -dbase) are read a test is applied to determine if subsequent molecules in the database file(s) are conformers of the same molecule. This flag controls how that conformer test is applied.

The following test types are recognized

Test Type Subsequent molecules are conformers if they
isomeric Have the same numbers of atoms and bonds in the same order. Each atom and bond has identical properties with its order correspondent in the subsequent connection table. Have the same atom and bond stereochemistry.
absolute Have the same numbers of atoms and bonds in the same order. Each atom and bond has identical properties with its order correspondent in the subsequent connection table.
canonical Have the same absolute (non-isomeric) graph.
none Subsequent molecules never treated as a conformer. Database is effectively single conformer.
-molnames <input filename> [No Default]

This parameter specifies a text file containing a list of molecule names (one name per line in the file). If this parameter is set then only molecules in the database file(s) (see parameter -dbase) with names that match those in the text files will be read in.

The general purpose of this flag is to provide an easy mechanism for reading a few specific molecule(s) that are contained in a large database, without having to extract those molecules by hand from the database.

Dock Options

-dock_resolution <setting> [Default: Standard]

The parameter controls the resolution of the docking both during the exhaustive search and the optimization. The resolution of the exhaustive search at each setting is as follows.

Setting Translational Stepsize Rotational Stepsize
High 1.0 Ångström 1.0 Ångström
Standard 1.0 Ångström 1.5 Ångströms
Low 1.5 Ångströms 2.0 Ångströms

During the optimization step the resolution is half that of the exhaustive search.

Output Files

-docked_molecule_file <filename> [Default: docked.oeb.gz]

File docked molecules will be written to. The file format is controlled by the extension of the filename. The following output formats are supported.

Format Extension
OEBinary .oeb
SDF .sdf
Gzipped OEBinary .oeb.gz
Gzipped SDF .sdf.gz

Scores will be attached as SD data to each pose with the tag HYBRID Chemgauss4 Score, unless the -score_tag option is used to specify another tag.

By default all docked molecules will be outputted in the order in which they were docked. Molecules can also be outputted in sorted order by using the -hitlist_size option.

Note

If this flag is not set by the user the default filename (i.e., docked.oeb.gz) will be automatically prefixed with the setting of the -prefix flag.

[ Aliases = -docked_mol_file, -docked_file, -docked, -out ]

-undocked_molecule_file <filename> [Default: undocked.oeb.gz]

Specifies an output file in which to place molecules that could not be docked into the active site (this generally occurs when a molecule is too large to fit in the site, or unable to match user specified docking constraints). The format of this file is determined by the filename extension. The following output formats are supported.

Format Extension
OEBinary .oeb
SDF .sdf
Isomeric SMILES .ism
Gzipped OEBinary .oeb.gz
Gzipped SDF .sdf.gz
Gzipped Isomeric Smiles .ism.gz

Note

If this flag is not set by the user the default filename (i.e., undocked.oeb.gz) will be automatically prefixed with the setting of the -prefix flag.

[ Aliases = -undocked_mol_file, -undocked_file, -undocked ]

-score_file <filename> [Default: score.txt]

Specifies a tab separated text file with the name and scores of the molecules.

Note

If this flag is not set by the user the default filename (i.e., score.txt) will be automatically prefixed with the setting of the -prefix flag.

[ Aliases : -score ]

-report_file <filename> [Default: report.txt]

Specifies a file that a text report of the run will be written to.

Note

If this flag is not set by the user the default filename (i.e., report.txt) will be automatically prefixed with the setting of the -prefix flag.

[ Aliases : -report ]

-settings_file <filename> [Default: settings.param]

Writes the settings of all parameters of the run to the specified output file. The settings will be listed in plain text with one parameter name follow by its value(s). This format is compatible with the format of parameter files, and therefore a settings file from a previous run can be passed to the -param flag to re-run the program with the same settings.

Note

If this flag is not set by the user the default filename (i.e., settings.param) will be automatically prefixed with the setting of the -prefix flag.

[ Aliases : -settings ]

-status_file <filename> [Default: status.txt]

If this parameter is set then the status of the run will be written to the given output file every few seconds (the previous contents of the file will be overwritten) during the run.

Note

If this flag is not set by the user the default filename (i.e., status.txt) will be automatically prefixed with the setting of the -prefix flag.

[ Aliases : -status ]

Output Options

-hitlist_size <num> [Default: 500]

This parameter controls the number of top scoring molecules that will be outputted at the end of the run (sorted by score), or can be used to specify that all molecules should be outputted as they are processed (unsorted).

If -hitlist_size is non-zero a sorted hitlist of the best scoring molecules is produced that will be maintained and output at the end of the run. The maximum size of the hitlist is -hitlist_size. If more than this number of molecules are in the input database only the top scoring molecules will be outputted and the rest will be discarded.

If -hitlist_size is zero the run will be in serial mode, i.e. each molecule will be outputted as it is processed (unsorted). For single processor runs this will be the order the molecules appear in the database file(s). For MPI runs the order will not be strictly the order the molecules appear in the database file(s).

There is no formal limit on the number of molecules that can be sorted and outputted at the end of the run. However, retaining a large number of molecules significantly increases the memory requirements. A good rule of thumb is that the setting -hitlist_size times the setting of -num_poses should not be larger than 10,000.

[ Aliases = -hitlist_size, -hitlist ]

-num_poses <num> [Default: 1]

Specifies the maximum number of docked poses to output for each docked molecule.

There is no formal limit on the number of poses per molecule that can be outputted, however, retaining a large number of alternate poses significantly increases the size of the molecules in memory and when outputted to disk. A good rule of thumb is that the setting -hitlist_size times the setting of -num_poses should not be larger than 10,000.

[ Aliases = -numposes ]

-score_tag <tag> [No Default]

This parameter overrides the default SD Data Tag used to store molecule scores (the default is HYBRID Chemgauss4 Score).

[ Aliases = -scoretag ]

-annotate_scores [Default: false]

If the value of this flag is set to true VIDA score annotations will be added to the processed molecules. These annotations are visible in VIDA (OpenEye’s molecular visualization program) and show a per atom breakdown of the score.

Note

The docked molecule output file format (see -docked_molecule_file) must be OEBinary when using score annotations.

[ Aliases = -annotate ]

-save_component_scores [Default: false]

If the value of this flag is set to true individual components of the total score will be saved to SD data on each pose and appear in the score file (see -score_file).

[ Aliases = -component_scores, -component ]

-no_extra_output_files [Default: false]

When this flag is set to true the only default output to the program will be the docked structure file (see -docked_molecule_file).

Using this flag supresses the default output of the following

Output Default filename Parameter
Undocked molecule file undocked.oeb.gz -undocked_molecule_file
Text score file score.txt -score_file
Report file report.txt -report_file
Settings file hybrid.param -settings_file
Status file status.txt -status_file

Only default output is supressed. If any of these output parameters are explicitly set by the users the relevant output file will still be written even if this switch is turned on.

[ Aliases = -no_extra, -noextra, -noextraoutputfiles, -no_extra_output, -noextraoutput ]

-no_dots [Default: false]
When this flag is set to true, a dot is being written to standard error for each docking molecule (or x in the case of a failure). Setting this flag to false to suppress dot/x writing.

[ Aliases = -nodots ]

-prefix <value> [Default: hybrid]

This flag prefixes all default output filenames with the specified value.

Note

This flag does not affect output filenames explicitly set by the user.

Example Commands

Basic Hybrid Docking Example

This example hybrid docks molecules using a single processor.

Input files

  • receptor.oeb.gz :

    A receptor file containing the structure of the target protein and a bound ligand. (see Receptor Preparation section).

  • multiconformer_ligands.oeb.gz :

    Conformationally expanded 3D ligands to dock. (see Ligand Preparation section).

Command line

prompt> hybrid -receptor receptor.oeb.gz -dbase multiconformer_ligands.oeb.gz

Output files

  • hybrid_docked.oeb.gz : Top 500 scoring molecules of multiconformer_ligands.oeb.gz docked into receptor.oeb.gz.
  • hybrid_undocked.oeb.gz : Molecules of multiconformer_ligands.oeb.gz that could not be docked into the active site (generally occurs if the molecules are too big for the site). This file will not be present if all molecules were successfully docked to the active site.
  • hybrid_score.txt : A tab separated text file containing the name and score of each of the top 500 ligands.
  • hybrid_report.txt : A text report of the docking process.
  • hybrid_settings.param : A text file containing the parameters used for this run.
  • hybrid_status.txt : A text file that is overwritten periodically during the run with the status of the run.

Hybrid Docking with Multiple Crystal Structures

In this example HYBRID docks molecules using multiple structures of the target protein.

Input files

  • receptor1.oeb.gz :

    A receptor file containing the structure of the target protein and a bound ligand. (see Receptor Preparation section).

  • receptor2.oeb.gz :

    A receptor file containing the structure of the second target protein and a bound ligand. This receptor file should have a different structure of the same target protein in receptor1.oeb.gz, generally with a different bound ligand. (see Receptor Preparation section).

  • multiconformer_ligands.oeb.gz :

    Conformationally expanded 3D ligands to dock. (see Ligand Preparation section).

Command line

prompt> hybrid -receptor receptor1.oeb.gz \
               -receptor receptor2.oeb.gz \
               -dbase multiconformer_ligands.oeb.gz

Output files

  • hybrid_docked.oeb.gz : Top 500 scoring molecules of multiconformer_ligands.oeb.gz docked into either receptor1.oeb.gz or receptor2.oeb.gz. The title and filename of the receptor docked to will be tagged to the SD data of each docked ligand (see the report.txt file).
  • hybrid_undocked.oeb.gz : Molecules of multiconformer_ligands.oeb.gz that could not be docked into the active site (generally occurs if the molecules are too big for the site). This file will not be present if all molecules were successfully docked to the active site.
  • hybrid_score.txt : A tab separated text file containing the following information for each of the top 500 ligands.
    • Name of the ligand
    • Score of the ligand
    • Title of the receptor site the ligand docked to.
    • Filename of the receptor site the ligand docked to.
  • hybrid_report.txt : A text report of the docking process.
  • hybrid_settings.param : A text file containing the parameters used for this run.
  • hybrid_status.txt : A text file that is written periodically during the run with the status of the run.

MPI docking example

In this example HYBRID docks molecules to a single receptor on 4 processors of the host machine.

Input files

  • receptor.oeb.gz :

    A receptor file containing the structure of the target protein and a bound ligand. (see Receptor Preparation section).

  • multiconformer_ligands.oeb.gz :

    Conformationally expanded ligands to dock. (see Ligand Preparation section).

Command line

prompt> hybrid -mpi_np 4 -receptor receptor.oeb.gz \
                              -dbase multiconformer_ligands.oeb.gz

Output files

  • hybrid_docked.oeb.gz : Top 500 scoring molecules of multiconformer_ligands.oeb.gz docked into receptor.oeb.gz.
  • hybrid_undocked.oeb.gz : Molecules of multiconformer_ligands.oeb.gz that could not be docked into the active site (generally occurs if the molecules are too big for the site). This file will not be present if all molecules were successfully docked to the active site.
  • hybrid_score.txt : A tab separated text file containing the name and score of each of the top 500 ligands.
  • hybrid_report.txt : A text report of the docking process.
  • hybrid_settings.param : A text file containing the parameters used for this run.
  • hybrid_status.txt : A text file that is overwritten periodically during the run with the status of the run.