OEChem’s PDB residue perception code now follows the changes required by the PDB version 3.0 standard. This includes disambiguation of the DNA residues “DA”, “DC” and “DG” from the RNA residues “D”, “C” and “G”. Nucleic acid backbone atom names now end in an apostrophe instead of a star/asterisk. The default ligand name in OEChem is now “UNL” instead of “MOL” or “LIG”.
There have been significant improvements to OEChem’s bond order perception code including phosphates, thiophosphates, dithioic acids, oximes, aldoximes, sulfur oxides, sulfites and iron-sulfur clusters.
The cis/trans detection logic in
OE3DToBondStereohas been robustified to do better with 2D depictions containing bonds that are co-linear with a chiral double bond or have zero length. The code now continues searching for additional incident bonds that have non-zero length and aren’t co-linear.
Significant improvements have been made to OEChem’s DNA/RNA perception code. The code now handles/recognizes truncated RNA biopolymers, and recognizes the bases “1MA”, “2MC”, “5MC”, “5MU”, “7MG”, “M2G”, “OMG”, “OMC”, “PSU”, etc…
OEChem’s residue perception code now handles/recognizes a much larger set of common co-factors and ligands, including “ADP”, “ATP”, “DMS”, “EDO”, “FAD”, “HEM”, “NAD”, “NAG”, “PEO” etc…
OEChem’s residue perception code now handles the non-standard amino acid ornithine (ORN).
OEAssignFormalChargesfunction variant has been added to the OEChem API to allow assignment of formal charges on a specific
OEAtomBasepointer. The existing
OEAssignFormalChargesfunction continues to operate on the whole molecule.
The OpenEye formal charge model has been extended such that a four-valent aluminum (aluminium) now has an implicit negative charge, and aluminum ions have a +3 formal charge. The model has also been tweaked to consider the sulfurs in iron-sulfur clusters as neutral
The OpenEye hydrogen count model has been tweaked to prefer five-valent phosphorus, such as
The values returned by
OEGetAverageWeighthave been updated and revised to follow the latest (2007) recommendations of the IUPAC Commission on Isotopic Abundances and Atomic Weights.
OEParseSmirksfunctions have been enhanced to allow a TAB character
\tto be treated as a separator after a SMARTS pattern. This matches the behavior of the SMILES parser,
OEParseSmiles, and simplifies the task of writing “patty”-like applications.
The OEChem SMILES and SMARTS parsers have been tweaked to allow the backslash used in specifying cis/trans stereochemistry to be duplicated in the input string, i.e.
C\\C=C\\Cis now interpreted as
C\C=C\C. This is convenient when working with programming languages such as C and C++ where the backslash is used as an escape character. Embedding SMILES in C/C++ source files requires the strings look like
C\\C=C\\Cwhich previously couldn’t be cut’n’paste like regular SMILES strings.
The interpretation of acyclic aromatic elements by OEChem’s SMILES reader now more accurately follows the Daylight toolkit. For example,
nis interpreted as
Fixed an obscure corner case in the OEChem SMILES writer, when not performing aromaticity perception and using the low-level SMILES writer. We need to preserve the explicit single bond (hyphen/minus) in
[cH2][cH2]would get interpreted like
ccand result in
The MDL file format reader has been enhanced to allow TABs in addition to spaces as separators in
.mol2file format reader has been enhanced to recognize pyrylium-like ring systems, containing charged oxygen atoms. A minor bug has also been fixed that could assign inappropriate formal charges to substituted nitrates.
.mol2format files, the atoms types
tare now treated identically to
Tand interpreted as Deuterium and Tritium respectively. Previously, they’d be interpreted as hydrogen atoms, but the isotope specification wasn’t getting set.
The Tripos bond types in Sybyl .mol2 format files are now treated as case-insensitive. We now treat
Aras identical to
Amas identical to
The CambridgeSoft CDX file format reader has been significantly rewritten to address bugs in the reading/writing of 3D coordinates.
The OpenEye OEB file format reader is now more robust to invalid, corrupt and/or truncated input files.
Significant work was done on SD data robustness.
If SD data was attached to an
OEConfBasethen written to OEB and read back into an
OEMolBasethe data from the
OEConfBasewould appear to disappear. This would result in losing the data if then written to SDF.
Significant speed improvements were made to SD data (e.g. csv2sdf saw a 4-fold speed improvement).
See Dude, where’s my SD data? in the programming theory manual for more details.
OEWritePDBFilefunctions are able to read and write ANISOU records, respectively. ANISOU records, which are atom property representing anisotropic temperature factors in PDB, are scaled by a factor of \(10^4\) and represented as integers.
OEGetCenterOfMassfunction, which computes the center of mass of a molecule (with or without atomic weights), was added to the
Major bug fixes¶
The algorithm that generates canonical SMILES did not ignore cis/trans stereo hydrogens and produced
[H]N=CC, rather than the correct
Even though this bug fix has affected only a small percentage of canonical SMILES, we highly recommend the regeneration of all canonical SMILES.
Small improvements have been made to the generation canonical isomeric smiles.
A problem has been fixed in OEChem’s Kekulization algorithms for large molecules (with between 250 and 1000 atoms) that can’t be assigned a valid Kekulé form. The changes in OEChem 1.5.0 that attempted to assign as much of a Kekulé form as possible upon failure could occasionally lead to
truefor an invalid molecule.
A performance problem in OEChem’s aromaticity perception has been resolved. Previously pathological substituted fullerenes and PAHs could cause OEChem’s aromaticity routines to take over a minute to perceive all of the conjugated cycles. Algorithmic improvements to OEChem’s aromaticity perception now allow all of the reported cases to be processed in a fraction of a second.
A rare problem interpreting the stereo from wedge/hash bonds around atoms of degree three has been resolved. When we have two bonds in the plane, and the third marked as a wedge or a hash, we need to determine whether the raised/lowered bond is in the larger or smaller sector subtended by the two in-plane bonds. A bug in this code failed to handle the case when all three bonds lay in the same half-circle. This problem is extremely rare, for example, no cases were found in the 250,251 MDL connection tables distributed by the NCI as the NCI August 2000 database.
Memory leak problem was fixed in
Minor bug fixes¶
The OEChem MDL mol file reader has been improved to allow the dimension field in the connection table header line to be omitted, and still correctly decide whether to process wedge/hash bonds or determine chirality from 3D coordinates. Previously, the molecule’s stereochemistry would be set incorrectly if the optional header line was missing.
The MDL file reader now perceives aromatic cycles using the MDL aromaticity model prior to calling
OEPerceiveChiral. This ensures that alternate Kekulé forms of substituted phenyl rings (for example) don’t inappropriately split symmetry groups, causing achiral double bonds to acquire specified cis/trans stereochemistry.
An aesthetic improvement has been made to the rules used in the
OEMDLPerceiveBondStereofunction that assigns wedge and hash bonds to depictions. For acyclic bonds, we now prefer to place the wedge or hash on bonds to non-ring atoms. A typo in the previous rules reversed this priority.
The OEChem SMILES writer was being miscompiled on IBM AIX 5.x resulting in canonical SMILES that differed from those on other platforms. The code has been rewritten to avoid the issue in IBM’s xlC compiler, so the SMILES are once again identical to the other platforms.
The OEChem PDB file parser has been updated to reflect the latest atom name exceptions in the RCSB/wwPDB database. These changes should eliminate the spurious Holmium and Helium atoms perceived in recently added ligand residues.
Numerous small performance improvements have been made to OEChem.
The torsion cutoff values for perceiving cis/trans bond stereo from 3D are relaxed in
OE3DToBondStereofunction. The cis cutoff is increased to 30 from 15, the trans cutoff is lowered to 150 from 165.
Even when the maximum number of matches is set, the MCS search can not be terminated upon reaching this limit, since there is no guarantee that the maximum common substructure has been detected. Instead, the search continues, then the best N matches are returned, where N is set by
The exhaustive and the approximate MCS algorithms no longer use different functions to determine whether a match is unique or not. Several other small problems were fixed in order to ensure that all matches located by the approximate method are also detected by the exhaustive one.
After changing the atomic number of an atom with
OEAtomBase.SetAtomicNum, the aromaticity and chirality of the molecule have to be re-perceived with the
Improved the stability of
OEPerceiveResiduesto not reorder atoms.
An API point was added to
OELibraryGento change the character used to separate product molecules title when concatenating reaction molecule titles together. For more information see
OELibraryGen.GetTitleSeparatorin the API manual.
A rare problem occurred in the substructure search when hydrogen atoms were matched first. This problem has been solved by rearranging the order in which atoms are taken into consideration, moving hydrogens to the end of the match order. Other small modifications have been made to improve the performance of substructure search.
Changes in documentation¶
OEDeleteSDDatawas improperly documented in the theory manual. The old documentation stated that only the first instance of a tag was deleted when all instances of the tag were actually deleted. The documentation has been corrected to state that all instances of a tag are deleted.
Maximum Common Substructure Search has been revised adding new examples, explaining the difference between the exhaustive and the approximate methods and providing more details about the built-in MCS scoring functions.
Figures have been added to OEExprOpts Namespace in order to demonstrate the effect of various atom and bond expression options on pattern matching.
C++, Python, and Java manuals brought into closer alignment with each other.
OEAnnotationclass provides ability to attach various graphical objects (sphere, box, surface, etc) to classes derived from
OEBaseby using the generic data functions
Minor bug fixes¶
The memory allocation performance of multi-threaded OEChem applications on both Windows and recent Linux/UNIX distributions (that use pthreads) has been dramatically improved. A new thread hashing algorithm is now used in OESystem’s memory pooling code which should dramatically reduce contention in allocation heavy multi-threaded code.
A number of minor performance and numerical stability improvements have been made to OEMath’s geometry routines.
When parsing the command line
--help -foois no longer sensitive to the case of