# Glossary¶

AUC

The area under the curve (AUC) of ROC curve is an aggregate measure of performance across all possible classification thresholds. The AUC value varies between $$[0.0 - 1.0]$$. The model with perfect predictions has an AUC of 1.0 while a model that always gets the predictions wrong has a AUC value of 0.0. The value 0.5 represents random prediction. The AUC number can be interpreted as the probability that the model ranks a random positive example more highly than a random negative example.

B-factor

B-factor (temperature factor) describes the displacement of the atomic positions from an average value. The more flexible an atom is the larger the displacement from the mean position will be (mean-squares displacement). The values of the B-factors are normally between 15 to 30 (sq. Ångströms), but can be much higher for more flexible regions. B-factors can indicate the mobility of atoms and they can also indicate where there are errors in model building.

canonical isomeric SMILES

In OEChem TK, the name canonical isomeric SMILES is used for a unique SMILES string that also encodes isotopic and stereo information. Due to the unambiguity of canonical isomeric SMILES, they can be used as a universal identifier for a specific chemical structure. For generating an canonical isomeric SMILES, use the OEMolToSmiles function.

Note

OEChem TK’s canonical isomeric SMILES terminology corresponds to Daylight’s ‘absolute’ SMILES definition.

chiral atom

In OEChem TK, an atom is considered chiral, if it is connected to four different substituent groups i.e. its mirror image is non-superimposable.

Note

In OEChem TK, an easily invertible nitrogen, i.e. a non-planar nitrogen with one attached hydrogen, is not considered to be chiral. This is due to the fact that trivalent nitrogen compound undergo rapid inversion that interconvert enantiomers.

CSV

Comma-separated-values file format.

CUDA

GPU-enabled calculation that provides 200x faster calculation than the two CPU modes above. CUDA mode involves pre-loading all fingerprints into GPU memory prior to performing similarity calculations. While this represents the fastest way to perform similarity searches once the fingerprints are loaded, searches are limited by GPU memory availability and will fall back to the memory-mapped CPU mode if the entire set of fingerprints cannot be preloaded into the GPU memory.

in-memory

The in-memory mode involves pre-loading all fingerprints into memory prior to and performing the search in the memory. While this represents the fastest way to perform similarity searches once the fingerprints are loaded, searches are limited by memory availability.

memory-mapped

The memory-mapped mode has no load time penalty or memory limitation but the search itself takes more time.

popcount

Popcount stands for population count and it is the procedure of counting number of ones in a bit string. It is available on most modern processors (both Intel and AMD) so it is hardware and not platform dependent.

ROC curve

The receiver operating characteristic (ROC) curve is a two dimensional graph in which the false positive rate is plotted on the X axis and the true positive rate is plotted on the Y axis. The ROC curves are useful to visualize and compare the performance of classifier methods.

SMARTS

SMARTS is a language that allows specifying substructures by providing a number of primitive symbols describing atomic and bond properties. Atom and bond primitive specifications may be combined to form expressions by using logical operators.