Glossary

canonical SMILES

In OEChem TK, the term canonical SMILES is used for a unique SMILES string that encodes the connection table of a molecule, but no chiral or isotopic information. Consequently, two stereoisomers always share the same canonical SMILES, since their stereo information is ignored during the canonicalization process. For generating a canonical SMILES, use the OECreateCanSmiString function.

Note

OEChem TK’s canonical SMILES terminology corresponds to Daylight’s ‘unique’ SMILES definition.

canonical isomeric SMILES

In OEChem TK, the name canonical isomeric SMILES is used for a unique SMILES string that also encodes isotopic and stereo information. Due to the unambiguity of canonical isomeric SMILES, they can be used as a universal identifier for a specific chemical structure. For generating a canonical isomeric SMILES, use the OECreateIsoSmiString function.

Note

OEChem TK’s canonical isomeric SMILES terminology corresponds to Daylight’s ‘absolute’ SMILES definition.

fingerprint

Fingerprints do not use a predefined pattern dictionary, the encoded fragments are enumerated exhaustively. Since the number of possible patterns present in molecular structures is extremely large, it is impractical to assign a particular bit to each unique pattern, as in the case of structural key method. Instead, each pattern is subjected to a hashing function that logically OR into the fingerprint. The usage of hashing inherently results in overlap of some structural patterns.

LINGO

LINGO is a very fast text-based molecular similarity search method. It is based on fragmentation of canonical isomeric SMILES strings into overlapping substrings.

MACCS

MACCS is a 166 bit-long structural key descriptor in which each bit is associated with a specific structural pattern.

popcount

Popcount refers to the procedure of counting the number of ones set in a bit string. It is available as a hardware instruction on many modern processors and can be used as an alternative to software-based counting methods to speed up fingerprint operations.

SMARTS

SMARTS is a language that allows specifying substructures by providing a number of primitive symbols describing atomic and bond properties. Atom and bond primitive specifications may be combined to form expressions by using logical operators. An introduction to SMARTS syntax is provided in SMARTS Pattern Matching. For more information go to http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

SMILES

A SMILES string represents a molecule by describing only its molecular graph (i.e. atoms and bonds in the connection table, but no chiral or isotopic information). There are usually a large number of valid SMILES which represent a given structure. For example, CCO, OCC and C(O)C all specify the structure of ethanol. For generating an arbitrary SMILES string, use the OECreateAbsSmiString function. For more information go to http://www.daylight.com/smiles/.

structural key

A structural key is a fixed-length bitstring in which each bit is associated with a specific molecular pattern. When a structural key is generated for a molecule, the bitstring encodes whether or not these specific molecular patterns are present or absent in the molecule. The performance of such keys depends on the choice of the fragments used for constructing the keys and the probability of their presence in the searched molecule databases.