Glossary
- AUC
The area under the curve (AUC) of ROC curve is an aggregate measure of performance across all possible classification thresholds. The AUC value varies between \([0.0 - 1.0]\). The model with perfect predictions has an AUC of 1.0 while a model that always gets the predictions wrong has a AUC value of 0.0. The value 0.5 represents random prediction. The AUC number can be interpreted as the probability that the model ranks a random positive example more highly than a random negative example.
See also
Area under the curve in Wikipedia
- B-factor
B-factor (temperature factor) describes the displacement of the atomic positions from an average value. The more flexible an atom is the larger the displacement from the mean position will be (mean-squares displacement). The values of the B-factors are normally between 15 to 30 (sq. Ångströms), but can be much higher for more flexible regions. B-factors can indicate the mobility of atoms and they can also indicate where there are errors in model building.
- canonical isomeric SMILES
In OEChem TK, the name canonical isomeric SMILES is used for a unique SMILES string that also encodes isotopic and stereo information. Due to the unambiguity of canonical isomeric SMILES, they can be used as a universal identifier for a specific chemical structure. For generating an canonical isomeric SMILES, use the OEMolToSmiles function.
Note
OEChem TK’s canonical isomeric SMILES terminology corresponds to Daylight’s ‘absolute’ SMILES definition.
- chiral atom
In OEChem TK, an atom is considered chiral, if it is connected to four different substituent groups i.e. its mirror image is non-superimposable.
Note
In OEChem TK, an easily invertible nitrogen, i.e. a non-planar nitrogen with one attached hydrogen, is not considered to be chiral. This is due to the fact that trivalent nitrogen compound undergo rapid inversion that interconvert enantiomers.
- CSV
Comma-separated-values file format.
See also
CSV standard at RFC 4180
CSV File Format section of the OEChem TK documentation about the layout of the CSV file format.
- popcount
Popcount stands for population count and it is the procedure of counting number of ones in a bit string. It is available on most modern processors (both Intel and AMD) so it is hardware and not platform dependent.
- ROC curve
The receiver operating characteristic (ROC) curve is a two dimensional graph in which the false positive rate is plotted on the X axis and the true positive rate is plotted on the Y axis. The ROC curves are useful to visualize and compare the performance of classifier methods.
See also
Receiver operating characteristic (ROC) in Wikipedia
An introduction to ROC analysis by Tom Fawcett
- SMARTS
SMARTS is a language that allows specifying substructures by providing a number of primitive symbols describing atomic and bond properties. Atom and bond primitive specifications may be combined to form expressions by using logical operators.
See also
SMARTS Pattern Matching section of the OEChem TK manual that provides an introduction to SMARTS syntax
- SMILES
A SMILES string represents a molecule by describing only its molecular graph (i.e. atoms and bonds in the connection table, but no chiral or isotopic information). There are usually a large number of valid SMILES which represent a given structure. For example, CCO, OCC and C(O)C all specify the structure of ethanol.
See also
SMARTS Pattern Matching section of the OEChem TK manual that provides an introduction to SMILES syntax
- SMIRKS
SMIRKS is a reaction transform language.
See also