Glossary

AUC

The area under the curve (AUC) of ROC curve is an aggregate measure of performance across all possible classification thresholds. The AUC value varies between \([0.0 - 1.0]\). The model with perfect predictions has an AUC of 1.0 while a model that always gets the predictions wrong has a AUC value of 0.0. The value 0.5 represents random prediction. The AUC number can be interpreted as the probability that the model ranks a random positive example more highly than a random negative example.

See also

B-factor

B-factor (temperature factor) describes the displacement of the atomic positions from an average value. The more flexible an atom is the larger the displacement from the mean position will be (mean-squares displacement). The values of the B-factors are normally between 15 to 30 (sq. Ångströms), but can be much higher for more flexible regions. B-factors can indicate the mobility of atoms and they can also indicate where there are errors in model building.

canonical isomeric SMILES

In OEChem TK, the name canonical isomeric SMILES is used for a unique SMILES string that also encodes isotopic and stereo information. Due to the unambiguity of canonical isomeric SMILES, they can be used as a universal identifier for a specific chemical structure. For generating an canonical isomeric SMILES, use the OEMolToSmiles function.

Note

OEChem TK’s canonical isomeric SMILES terminology corresponds to Daylight’s ‘absolute’ SMILES definition.

chiral atom

In OEChem TK, an atom is considered chiral, if it is connected to four different substituent groups i.e. its mirror image is non-superimposable.

Note

In OEChem TK, an easily invertible nitrogen, i.e. a non-planar nitrogen with one attached hydrogen, is not considered to be chiral. This is due to the fact that trivalent nitrogen compound undergo rapid inversion that interconvert enantiomers.

CSV

Comma-separated-values file format.

See also

popcount

Popcount stands for population count and it is the procedure of counting number of ones in a bit string. It is available on most modern processors (both Intel and AMD) so it is hardware and not platform dependent.

ROC curve

The receiver operating characteristic (ROC) curve is a two dimensional graph in which the false positive rate is plotted on the X axis and the true positive rate is plotted on the Y axis. The ROC curves are useful to visualize and compare the performance of classifier methods.

SMARTS

SMARTS is a language that allows specifying substructures by providing a number of primitive symbols describing atomic and bond properties. Atom and bond primitive specifications may be combined to form expressions by using logical operators.

See also

SMILES

A SMILES string represents a molecule by describing only its molecular graph (i.e. atoms and bonds in the connection table, but no chiral or isotopic information). There are usually a large number of valid SMILES which represent a given structure. For example, CCO, OCC and C(O)C all specify the structure of ethanol.

See also

SMIRKS

SMIRKS is a reaction transform language.

See also