OpenEye Glossary of Terms

AMI

An Amazon Machine Image (AMI) is a supported and maintained image provided by AWS that provides the information required to launch an instance.

apo protein
apo proteins

An apo protein is just the structure of the protein with no ligand bound. “Apo” simply means the protein structure without any ligand and “Holo” stands for the protein-ligand complex.

asymmetric unit

The contents of a PDB or mmCIF file from an X-ray contain an asymmetric unit (ASU) as the output of the experiment. This is sometimes equivalent to the biological unit (BU), but often requires manipulation to create a correct BU.

AUC

The area under the curve (AUC) of ROC curve is an aggregate measure of performance across all possible classification thresholds. The AUC value varies between \([0.0 - 1.0]\). The model with perfect predictions has an AUC of 1.0 while a model that always gets the predictions wrong has a AUC value of 0.0. The value 0.5 represents random prediction. The AUC number can be interpreted as the probability that the model ranks a random positive example more highly than a random negative example. See also Area under the curve in Wikipedia and ROC curve.

B-factor

B-factor (temperature factor) describes the displacement of the atomic positions from an average value. The more flexible an atom is the larger the displacement from the mean position will be (mean-squares displacement). The values of the B-factors are normally between 15 to 30 (sq. Ångströms), but can be much higher for more flexible regions. B-factors can indicate the mobility of atoms and they can also indicate where there are errors in model building.

biological unit

In short, the biological unit (BU) is an object that contains the biologically relevant parts of an ASU, which have not been split into various molecular components and are not yet prepped for modeling. For a more detailed explanation of BUs, refer to Introduction to Biological Assemblies and the PDB Archive hosted at the RCSB.

bug-fix release

Toolkit and Applications bug-fix releases happen between features releases, if necessary. These releases provide crucial bug fixes but no new features or platform updates.

canonical SMILES

In OpenEye documentation, the term canonical SMILES is used for a unique SMILES string that encodes the connection table of a molecule, but no chiral or isotopic information. Consequently, two stereoisomers always share the same canonical SMILES, since their stereo information are ignored during the canonicalization process. For generating a canonical SMILES, use the OECreateCanSmiString function. OpenEye’s canonical SMILES term corresponds to Daylight’s unique SMILES definition.

canonical isomeric SMILES

In OpenEye documentation, the name canonical isomeric SMILES is used for a unique SMILES string that also encodes isotopic and stereo information. Due to the unambiguity of canonical isomeric SMILES, they can be used as a universal identifier for a specific chemical structure. For generating a canonical isomeric SMILES, use the OECreateIsoSmiString or the preferred high-level OEMolToSmiles function. OpenEye’s canonical isomeric SMILES terminology corresponds to Daylight’s absolute SMILES definition.

chiral atom

In OpenEye terminology, an atom is considered chiral if it is connected to four different substituent groups, that is, its mirror image is not superimposable.In OEChem TK, an easily invertible nitrogen, that is, a nonplanar nitrogen with one attached hydrogen, is not considered to be chiral. This is due to the fact that trivalent nitrogen compound undergo rapid inversion that interconvert enantiomers. See also the stereo atom definition, the SetChiral and IsChiral methods of OEAtomBase, and the Atom Chirality section.

chiral bond

In OEChem TK, a double bond is considered chiral if the cis and trans forms of this bond represent two distinct isomers. A chiral bond can be either a chain bond or a ring bond that does not belong to any ring smaller than 8-membered. See also the stereo atom definition, the SetChiral and IsChiral methods of OEAtomBase, and the Atom Chirality section.

CIP

The CIP (Cahn-Ingold-Prelog) system is rule-based method used in organic chemistry to name the stereoisomers of a molecule. The CIP system assigns an R or S descriptor to each stereocenter and an E or Z descriptor to each double bond so that the configuration of the entire molecule can be specified uniquely.

conformer
conformers

Conformation isomers are commonly called conformers. Conformational isomers are stereoisomers that differ from one another by rotation around a single bond. Conformational isomers can be converted from one to another by rotations about formally single bonds without making or breaking any covalent bonds.

CPK

CPK (Corey,Pauling,Koltun) is a popular color convention for distinguishing atoms of different chemical elements in molecular models. See also the Appendix: Element coloring (CPK) chapter and the OEAtomColorStyle namespace.

CSV

Comma-separated-values file format. See also the CSV standard at RFC 4180 and the CSV File Format section.

cube
cubes

Basic computational components used to construct a programmed workflow for execution in Orion. Cubes communicate with one another via messages and transmit data using defined input and output ports. Cubes can be written in Python, from scratch, using OpenEye APIs. These APIs, as well as ready-made cubes for general computation tasks, are provided as part of the Orion Platform package. Ready-made cubes for scientific computations are available in the OpenEye Snowball package.

CUDA

GPU-enabled calculation. CUDA mode involves preloading all fingerprints into GPU memory prior to performing similarity calculations. While this represents the fastest way to perform similarity searches once the fingerprints are loaded, searches are limited by GPU memory availability and will fall back to the memory-mapped CPU mode if the entire set of fingerprints cannot be preloaded into the GPU memory.

CXSMILES

The Chemaxon Extended SMILES format which adds an additional (and optional) appendix to the SMILES string to encode a wide variety of additional features that are not part of the SMILES representation proper. Only the canonical SMILES portion of the CXSMILES information is canonically ordered. There is no specification in the CXSMILES format that any ordering of information within the appendix is either expected or required. However, a deterministic sort order is applied to the appendix entries to facilitate the potential use of the CXSMILES string for structure duplicate checking which would consider different groupings of enhanced stereo groups as structure differentiating. See also CXSMILES documentation

design unit

The design unit (DU) is an object that contains the extracted and prepared parts of a single biological unit (BU), ready for modeling. The parts include

  1. protein;

  2. ligand (not always: an apo DU will not contain a ligand);

  3. site residues;

  4. packing residues (if any exist near the site); and

  5. excipients (if any exist near the site).

One can interact with each part of the DU through APIs. The APIs are listed here OEDesignUnit.

dock
docking

Docking is a molecular modeling technique used to predict the optimal binding of a protein and a ligand. In the drug discovery process, structure-based methods such as docking—the lock-and-key concept where the protein is the lock and the ligand is the key—use the structure of a target protein to discover hits and optimize leads. Docking is used to identify possible active molecules in a large molecule database, by examining their shapes and chemical complementarity to the active site.

excipient

An excipient is an inert substance that is formulated with an active drug to produce its pharmaceutical dosage form. Excipients serve various roles, such as long-term stabilization, aiding the manufacturing process, and enhancing the therapeutic properties of the active drug in its dosage form, such as facilitating drug absorption or enhancing solubility.

FASTA

In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.

feature release

Toolkit and Application feature releases take place in spring and fall. These releases provide support for new platforms and major and minor code fixes as well as new features.

fingerprint

Fingerprints do not use a predefined pattern dictionary; the encoded fragments are enumerated exhaustively. Since the number of possible patterns present in molecular structures is extremely large, it is impractical to assign a particular bit to each unique pattern, as in the case of the structural key method. Instead, each pattern is subjected to a hashing function that logically ORs into the fingerprint. The use of hashing inherently results in overlap of some structural patterns.

floe
floes

Programmed workflows for execution by the web-based Orion workflow engine. Many ready-made floes are developed and delivered by OpenEye, for execution in the Orion environment. For documentation of ready-made floes and their use, see the Orion Suites and Modules Guide. Floes can be developed using Orion programming interfaces (APIs) and can use either ready-made cubes delivered with OpenEye software or cubes of your own, developed with the guidance of the Orion Programming Guide.

GRASP

GRASP is a molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties. (online:fx.php/Software:GRASP)

InChI

From About the InChi Standard, Originally developed by the International Union of Pure and Applied Chemistry (IUPAC), the IUPAC International Chemical Identifier (InChI) is a character string generated by computer algorithm. It is a tool to be used in software applications designed and developed by those who choose to use it. The InChI algorithm turns chemical structures into machine-readable strings of information. InChIs are unique to the compound they describe and can encode absolute stereochemistry making chemicals and chemistry machine-readable and discoverable. The InChI format and algorithm are non-proprietary and the software is open source, with ongoing development done by the community. A number of IUPAC working groups is currently creating standard for those areas of chemistry that are not yet handled by the InChI algorithm.

InChIKey

From About the InChi Standard, The InChIKey has been designed so that Internet search engines can search and find the links to a given InChI. To make the InChIKey the InChI string is subjected to a compression algorithm to create a fixed-length string of upper-case characters. While the InChI to InChIKey hash compression is irreversible, there are a number of InChI resolvers available to look up an InChI given an InChIKey.

in-memory

The in-memory mode involves preloading all fingerprints into memory and performing the search in the memory. While this represents the fastest way to perform similarity searches once the fingerprints are loaded, searches are limited by memory availability.

Iridium

Iridium [Warren-2012] is a metric used to estimate the model quality of a structure resulting from an X-ray crystallography experiment. For more information about Iridium see the discussion in the product documentation.

JSON

Sort for JavaScript Object Notation, this is a language-independent data format that works with most common languages, such as Python, Java, and others in the C-family of languages.

LINGO

LINGO is a very fast text-based molecular similarity search method. It is based on fragmentation of canonical isomeric SMILES strings into overlapping substrings. See also the discussion in the OEChem documentation.

MACCS

MACCS is a 166 bit-long structural key descriptor in which each bit is associated with a specific structural pattern.

marching cubes

(online: Marching cubes)

MDL

MDL refers to the original “Molecular Design Limited Information Systems” company, which developed the MOL format. Later, “Molecular Design Limited” was dropped in favor of the MDL acronym. For more information, see https://en.wikipedia.org/wiki/MDL_Information_Systems.

memory-mapped

The memory-mapped mode has no load time penalty or memory limitation, but the search itself takes more time.

module
modules

In Python, modules are files with the file type .py. A Python module can, and usually does, define multiple Python classes, methods, and other objects. For more information, see the official Python documentation, such as https://docs.python.org/3/tutorial/modules.html.

MOL

A MOL type file uses a format originally defined by Molecular Design Limited that encodes information about the atoms, bonds, connectivity, and coordinates of a molecule, in plain text. For more information, see the Wikipedia definition

Monte Carlo

Monte Carlo simulations rely on repeated random sampling to model the probability of possible outcomes in a process that cannot easily be predicted. Thus it is used to estimate model uncertainty. In machine learning, Monte Carlo dropout refers to the process of randomly removing nodes from a neural network during training to regularize learning and avoid overfitting.

nonterminal atom

An atom is considered nonterminal if it is connected to two or more nonhydrogen atoms, that is, the method GetHvyDegree() of OEAtomBase returns a value greater than or equal to 2.

OEB

This is a binary OpenEye format used as a compact way to represent molecules with multiple conformers.

Orion back end

The Orion back end is the set of programs and services that make up the “server” part of the client/server Orion product.

Orion Platform

The Orion Platform is the set of OpenEye-supported interfaces to the Orion back end. Some interfaces are meant to be called by Python programs and some are command-line programs.

package
packages

In Python, packages are sets of Python scripts (programs) organized in a directories that also includes an __init__.py script. The __init__.py script runs when the directory is opened; it may import objects used by scripts in the package and perform other useful initial tasks for the packages’ scripts. Floe packages for use in Orion are specific examples of Python packages.

popcount

Popcount refers to the procedure of counting the number of 1s set in a bit string. It is available as a hardware instruction on many modern processors and can be used, as an alternative to software-based counting methods, to speed up fingerprint operations.

preliminary API

APIs that are still under development might be delivered and marked as preliminary. Providing early access means that the APIs can be improved based on user feedback before committing to a final API design. Preliminary APIs will sometimes be made available for a limited set of platforms and languages, usually Python and C++.

project

In Orion, a project is a container for data related to the pursuit of an objective. A project serves as a location to store data and allows for efficient sharing and collaboration. Each project has its own Project Summary page.

raster image

Raster graphics represents of images as an array of pixels or points of color.

receptor
receptors

Receptors are a class of proteins that selectively bind to specific ligand molecules to change the behavior of a cell. This binding may activate or inactivate a receptor. A receptor is an integral part of a Design Unit and contains information about the location and characteristics of the binding pocket.

ROC curve

The receiver operating characteristic (ROC) curve is a two-dimensional graph in which the false positive rate is plotted on the X-axis and the true positive rate is plotted on the Y-axis. The ROC curves are useful to visualize and compare the performance of classifier methods. See also Receiver operating characteristic (ROC) in Wikipedia, An introduction to ROC analysis by Tom Fawcett, and the definition of AUC.

rotatable bond

In OpenEye documentation, a bond is considered rotatable only if it is a single nonring bond between two nonterminal, non-triple-bonded atoms. For example the following structures have no rotatable bonds: CCC, CCC#CCC, and C1CCCCC1. Note: Since the “rotatable” property depends on the ‘in ring’ property. the OEFindRingAtomsAndBonds function must be called before accessing the rotatable bond property via the IsRotor method of OEQBondBase.

script
scripts

Python programs are often called scripts because they are usually interpreted and not precompiled. For most purposes, script and program are the same in Python, and this documentation may use either term to refer to a file of Python source code.

SDF

A Structural Data File (SDF) encodes information about chemical compounds in plain text. It is based on the MOL format developed originally by Molecular Design Limited (MDL). An SDF file can encode multiple molecules in a single file. The internet has considerable practical information on the format, such as https://lifechemicals.com/order-and-supply/how-to-work-with-sd-files.

shard

Shards are thin wrappers around files (not related to Orion Files). Each shard can store up to 5 GB of arbitrary binary data as well as 256 kB of user defined metadata in the form of JSON.

SMARTS

SMILES Arbitrary Target Specification (SMARTS)

SMARTS is a language that allows specifying substructures by providing a number of primitive symbols describing atomic and bond properties. Atom and bond primitive specifications may be combined to form expressions by using logical operators. An introduction to SMARTS syntax is provided in SMARTS Pattern Matching. For more information go to the documentation of SMARTS on the Daylight Chemical Information Systems site

SMILES

Simplified Molecular Input Line Entry System (SMILES)

A SMILES string represents a molecule by describing only its molecular graph (i.e. atoms and bonds in the connection table, but no chiral or isotopic information). There are usually a large number of valid SMILES which represent a given structure. For example, CCO, OCC and C(O)C all specify the structure of ethanol. For generating an arbitrary SMILES string, use the OECreateAbsSmiString function. An introduction to SMILES syntax is provided in chapter SMILES Line Notation. For more information go to the documentation of SMILES on the Daylight Chemical Information Systems site

SMIRKS

SMILES Reaction Kernel Specification (SMIRKS)

SMIRKS is a reaction transform language. A reaction considered valid according to the strict SMIRKS semantics if 1) all mapped product atoms have corresponding mapped reactant atoms, and 2) all atom maps must be pairwise (i.e., every map class has exactly one reactant and one product atom) The strict semantics also requires that unmapped reactant atoms are destroyed in the reaction. The strict semantics means full compliance with SMIRKS defined by its originator, Daylight Inc. An introduction to SMIRKS syntax is provided in Reactions. For more information go to the documentation of SMIRKS on the Daylight Chemical Information Systems site Additional information on the semantics of SMIRKS language can be found at SMIRKS Tutorial on the Daylight Chemical Information Systems site.

stable API

Toolkits that have been fully developed, documented, and tested are considered stable. The API definitions for these toolkits will be consistent across languages and platforms and will not change between toolkit releases. Code written with earlier versions of the toolkit will compile and run with newer versions. This allows users to easily upgrade to newer versions without worrying about breaking legacy code.

stereo atom

In OEChem TK the atom stereo information is stored as relative positions of neighboring atoms around a tetrahedral center. If an atom has specified stereochemistry, then the HasStereoSpecified method of OEAtomBase returns True. In OEChem TK, atom stereochemistry is internally represented by the two properties stereo atom and chiral atom. These properties are completely independent and allow OEChem TK to retain configuration information about atoms that are not chiral atoms, or to identify chiral atoms whose configuration is not specified. In the current version of OEChem TK, the only class of stereochemistry supported for atoms is Tetrahedral, which corresponds to \(sp3\) tetrahedral chirality. Valid return values for the OEAtomStereo Tetrahedral stereochemistry class are Left and Right. See also the chiral atom definition, the OEBondBase SetStereo method, the OEBondBase GetStereo method, the OEBondBase HasStereoSpecified method, and the Bond Stereochemistry section

stereo bond

In OEChem TK the bond stereo information is stored as relative positions of neighboring atoms around a bond. If a bond has specified stereochemistry, then the HasStereoSpecified method of OEAtomBase returns True. In OEChem TK, atom stereochemistry is internally represented by the two properties stereo atom and chiral atom. These properties are completely independent and allow OEChem TK to retain configuration information about atoms that are not chiral atoms, or to identify chiral atoms whose configuration is not specified. In the current version of OEChem TK, the only class of stereochemistry supported for atoms is Tetrahedral, which corresponds to \(sp3\) tetrahedral chirality. Valid return values for the OEAtomStereo Tetrahedral stereochemistry class are Left and Right. See also the chiral atom definition, the OEBondBase SetStereo method, the OEBondBase GetStereo method, the OEBondBase HasStereoSpecified method, and the Bond Stereochemistry section

structural key

A structural key is a fixed-length bitstring in which each bit is associated with a specific molecular pattern. When a structural key is generated for a molecule, the bitstring encodes whether or not these specific molecular patterns are present or absent in the molecule. The performance of such keys depends on the choice of the fragments used for constructing the keys and the probability of their presence in the searched molecule databases.

SVG

Scalable Vector Graphics (SVG) is a family of specifications of an XML-based file format for describing two-dimensional vector graphics.

Tanimoto

The Tanimoto coefficient (T) is a similarity measure for comparing chemical structures by means of molecular fingerprints. It is the ratio of the number of features common to both compounds to the total number of features. Two structures are generally considered similar if T > 0.85.

vector image

Vector graphics represent images using geometrical primitives such as points, lines, curves, and shapes or polygon(s), which are all based on mathematical equations.

zwitterion
zwitterionic

A zwitterion contains an equal number of positively and negatively charged functional groups. For more information, see the Wikipedia definition.