OECreateIsoSmiString¶
void OECreateIsoSmiString(std::string &str, const OEMolBase &mol)
Creates an canonical isomeric SMILES string representing a given molecule.
This function is just a special case of the OECreateSmiString
function, called with the flavor OESMILESFlag.ISOMERIC
.
Note
This function produces SMILES that correspond what Daylight Chemical Information Systems term an ‘absolute’ SMILES
Note that the canonical SMILES generated by this function remains dependent on the state of the molecule, especially its aromaticity state (see examples in Aromaticity Models in OEChem TK section). Thus, to generate a canonical smiles suitable for purposes such as a database key, the user must assure that the state of the molecule has been standardized. In particular, aromaticity should be perceived according to the preferred model.
In contrast, the high-level output function OEWriteMolecule
(and OEMolToSmiles
) , when
writing the canonical SMILES format (OEFormat.ISM
) does invoke
OEFindRingAtomsAndBonds
and OEAssignAromaticFlags
.
Furthermore, whether OEWriteMolecule
or
OECreateIsoSmiString
is used, the canonical SMILES generated depends on
the current stereo specifications for the molecule.
If the goal is a canonical isomeric SMILES which is unique for all representations
of an equivalent stereoisomer, i.e., for use as a database key, it is the user’s
responsibility to assure that the stereochemical state of the molecule has been
rationalized and standardized, using methods such as:
See also
OEMolToSmiles
for a high-level way to create SMILES equivalent to what is written byOEWriteMolecule
.OECreateAbsSmiString
to create an arbitrary SMILESOECreateCanSmiString
to create a canonical SMILESExample program UniqMol.cs
Validation:
The OECreateIsoSmiString
is rigorously tested to ensure
that it generates unique string representations of molecules.
This test involves randomly reordering the atoms and bonds of molecules
and confirming that this has no effect on the SMILES generated by the
OECreateIsoSmiString
function.
Database |
Size |
Success Rate |
---|---|---|
Wombat |
53K |
100.0 % |
Maybridge |
64K |
100.0 % |
MDDR |
111K |
100.0 % |
NCI |
250K |
100.0 % |
Running the validation test on 24M unique molecules produces only 129 failure cases.