- OEChem‘s PDB residue perception code now follows the changes required by the PDB version 3.0 standard. This includes disambiguation of the DNA residues “DA”, “DC” and “DG” from the RNA residues “D”, “C” and “G”. Nucleic acid backbone atom names now end in an apostrophe instead of a star/asterisk. The default ligand name in OEChem is now “UNL” instead of “MOL” or “LIG”.
- There have been significant improvements to OEChem‘s bond order perception code including phosphates, thiophosphates, dithioic acids, oximes, aldoximes, sulfur oxides, sulfites and iron-sulfur clusters.
- The cis/trans detection logic in OE3DToBondStereo has been robustified to do better with 2D depictions containing bonds that are co-linear with a chiral double bond or have zero length. The code now continues searching for additional incident bonds that have non-zero length and aren’t co-linear.
- Significant improvements have been made to OEChem‘s DNA/RNA perception code. The code now handles/recognizes truncated RNA biopolymers, and recognizes the bases “1MA”, “2MC”, “5MC”, “5MU”, “7MG”, “M2G”, “OMG”, “OMC”, “PSU”, etc...
- OEChem‘s residue perception code now handles/recognizes a much larger set of common co-factors and ligands, including “ADP”, “ATP”, “DMS”, “EDO”, “FAD”, “HEM”, “NAD”, “NAG”, “PEO” etc...
- OEChem‘s residue perception code now handles the non-standard amino acid ornithine (ORN).
- A new OEAssignFormalCharges function variant has been added to the OEChem API to allow assignment of formal charges on a specific OEAtomBase pointer. The existing OEAssignFormalCharges function continues to operate on the whole molecule.
- A new OEPreserveResInfo_AtomName flavor has been added to the OEPreserveResInfo namespace to allow the OEPerceiveResidues function to preserve the original PDB atom name.
- The OpenEye formal charge model has been extended such that a four-valent aluminum (aluminium) now has an implicit negative charge, and aluminum ions have a +3 formal charge. The model has also been tweaked to consider the sulfurs in iron-sulfur clusters as neutral [S] radicals.
- The OpenEye hydrogen count model has been tweaked to prefer five-valent phosphorus, such as O=[PH2]O over three-valent O=PO.
- The values returned by OEGetAverageWeight have been updated and revised to follow the latest (2007) recommendations of the IUPAC Commission on Isotopic Abundances and Atomic Weights.
- The OEParseSmarts and OEParseSmirks functions have been enhanced to allow a TAB character \t to be treated as a separator after a SMARTS pattern. This matches the behavior of the SMILES parser, OEParseSmiles, and simplifies the task of writing “patty”-like applications.
- The OEChem SMILES and SMARTS parsers have been tweaked to allow the backslash used in specifying cis/trans stereochemistry to be duplicated in the input string, i.e. C\\C=C\\C is now interpreted as C\C=C\C. This is convenient when working with programming languages such as C and C++ where the backslash is used as an escape character. Embedding SMILES in C/C++ source files requires the strings look like C\\C=C\\C which previously couldn’t be cut’n’paste like regular SMILES strings.
- The interpretation of acyclic aromatic elements by OEChem‘s SMILES reader now more accurately follows the Daylight toolkit. For example, n is interpreted as [NH2] and not [N], and nn now means N=N instead of [N]=[N].
- Fixed an obscure corner case in the OEChem SMILES writer, when not performing aromaticity perception and using the low-level SMILES writer. We need to preserve the explicit single bond (hyphen/minus) in [cH2]-[cH2] otherwise [cH2][cH2] would get interpreted like cc and result in c=c and [cH2]=[cH2].
- The MDL file format reader has been enhanced to allow TABs in addition to spaces as separators in M CHG, M RAD and M ISO lines.
- The Sybyl .mol2 file format reader has been enhanced to recognize pyrylium-like ring systems, containing charged oxygen atoms. A minor bug has also been fixed that could assign inappropriate formal charges to substituted nitrates.
- In Sybyl .mol2 format files, the atoms types d and t are now treated identically to D and T and interpreted as Deuterium and Tritium respectively. Previously, they’d be interpreted as hydrogen atoms, but the isotope specification wasn’t getting set.
- The Tripos bond types in Sybyl .mol2 format files are now treated as case-insensitive. We now treat AR and Ar as identical to ar, and AM and Am as identical to am, etc...
- The CambridgeSoft CDX file format reader has been significantly rewritten to address bugs in the reading/writing of 3D coordinates.
- The OpenEye OEB file format reader is now more robust to invalid, corrupt and/or truncated input files.
- Added versions of OEIsReadable and OEIsWriteable that can directly take a filename or extension.
- Significant work was done on SD data robustness.
- If SD data was attached to an OEMCMolBase and an OEConfBase then written to OEB and read back into an OEMolBase the data from the OEConfBase would appear to disappear. This would result in losing the data if then written to SDF.
- SD data attached to an OEMolBase, written to OEB, then read into an OEMCMolBase will now be attached to the OEConfBase instead of the OEMCMolBase.
- Significant speed improvements were made to SD data (e.g. csv2sdf saw a 4-fold speed improvement).
- See Dude, where’s my SD data? in the programming theory manual for more details.
- The OEReadPDBFile and OEWritePDBFile functions are able to read and write ANISOU records, respectively. ANISOU records, which are atom property representing anisotropic temperature factors in PDB, are scaled by a factor of \(10^4\) and represented as integers.
- OEGetCenterOfMass function, which computes the center of mass of a molecule (with or without atomic weights), was added to the OEChem namespace.
Major bug fixes¶
- The algorithm that generates canonical SMILES did not ignore cis/trans stereo hydrogens and produced [H]N=CC, rather than the correct N=CC canonical SMILES.
Even though this bug fix has affected only a small percentage of canonical SMILES, we highly recommend the regeneration of all canonical SMILES.
- Small improvements have been made to the generation canonical isomeric smiles.
- A problem has been fixed in OEChem‘s Kekulization algorithms for large molecules (with between 250 and 1000 atoms) that can’t be assigned a valid Kekulé form. The changes in OEChem 1.5.0 that attempted to assign as much of a Kekulé form as possible upon failure could occasionally lead to OEKekulize returning true for an invalid molecule.
- A performance problem in OEChem‘s aromaticity perception has been resolved. Previously pathological substituted fullerenes and PAHs could cause OEChem‘s aromaticity routines to take over a minute to perceive all of the conjugated cycles. Algorithmic improvements to OEChem‘s aromaticity perception now allow all of the reported cases to be processed in a fraction of a second.
- A rare problem interpreting the stereo from wedge/hash bonds around atoms of degree three has been resolved. When we have two bonds in the plane, and the third marked as a wedge or a hash, we need to determine whether the raised/lowered bond is in the larger or smaller sector subtended by the two in-plane bonds. A bug in this code failed to handle the case when all three bonds lay in the same half-circle. This problem is extremely rare, for example, no cases were found in the 250,251 MDL connection tables distributed by the NCI as the NCI August 2000 database.
- Memory leak problem was fixed in OELibraryGen.
Minor bug fixes¶
- The OEChem MDL mol file reader has been improved to allow the dimension field in the connection table header line to be omitted, and still correctly decide whether to process wedge/hash bonds or determine chirality from 3D coordinates. Previously, the molecule’s stereochemistry would be set incorrectly if the optional header line was missing.
- The MDL file reader now perceives aromatic cycles using the MDL aromaticity model prior to calling OEPerceiveChiral. This ensures that alternate Kekulé forms of substituted phenyl rings (for example) don’t inappropriately split symmetry groups, causing achiral double bonds to acquire specified cis/trans stereochemistry.
- An aesthetic improvement has been made to the rules used in the OEMDLPerceiveBondStereo function that assigns wedge and hash bonds to depictions. For acyclic bonds, we now prefer to place the wedge or hash on bonds to non-ring atoms. A typo in the previous rules reversed this priority.
- The OEChem SMILES writer was being miscompiled on IBM AIX 5.x resulting in canonical SMILES that differed from those on other platforms. The code has been rewritten to avoid the issue in IBM’s xlC compiler, so the SMILES are once again identical to the other platforms.
- The OEChem PDB file parser has been updated to reflect the latest atom name exceptions in the RCSB/wwPDB database. These changes should eliminate the spurious Holmium and Helium atoms perceived in recently added ligand residues.
- Numerous small performance improvements have been made to OEChem.
- The torsion cutoff values for perceiving cis/trans bond stereo from 3D are relaxed in OE3DToBondStereo function. The cis cutoff is increased to 30 from 15, the trans cutoff is lowered to 150 from 165.
- Even when the maximum number of matches is set, the MCS search can not be terminated upon reaching this limit, since there is no guarantee that the maximum common substructure has been detected. Instead, the search continues, then the best N matches are returned, where N is set by OEMCSSearch.SetMaxMatches.
- The exhaustive and the approximate MCS algorithms no longer use different functions to determine whether a match is unique or not. Several other small problems were fixed in order to ensure that all matches located by the approximate method are also detected by the exhaustive one.
- After changing the atomic number of an atom with OEAtomBase.SetAtomicNum, the aromaticity and chirality of the molecule have to be re-perceived with the OEAssignAromaticFlags and OEPerceiveChiral functions, respectively.
- Improved the stability of OEPerceiveResidues to not reorder atoms.
- An API point was added to OELibraryGen to change the character used to separate product molecules title when concatenating reaction molecule titles together. For more information see OELibraryGen.SetTitleSeparator and OELibraryGen.GetTitleSeparator in the API manual.
- A rare problem occurred in the substructure search when hydrogen atoms were matched first. This problem has been solved by rearranging the order in which atoms are taken into consideration, moving hydrogens to the end of the match order. Other small modifications have been made to improve the performance of substructure search.
- OEFindRingAtomsAndBonds is automatically called to perceive rings in structures returned by OEUniMolecularRxn or OELibraryGen.
- Bug was fixed that caused OEGraphMol and OEMol parameters to always fail to load in OERegisterMolParameters.
Changes in documentation¶
- OEDeleteSDData was improperly documented in the theory manual. The old documentation stated that only the first instance of a tag was deleted when all instances of the tag were actually deleted. The documentation has been corrected to state that all instances of a tag are deleted.
- Maximum Common Substructure Search has been revised adding new examples, explaining the difference between the exhaustive and the approximate methods and providing more details about the built-in MCS scoring functions.
- Figures have been added to OEExprOpts Namespace in order to demonstrate the effect of various atom and bond expression options on pattern matching.
- C++, Python, and Java manuals brought into closer alignment with each other.
- OEInterface allows multiple values per parameter. The values can be accessed by calling the OEInterface.GetList template member function of OEInterface.
- OEAnnotation class provides ability to attach various graphical objects (sphere, box, surface, etc) to classes derived from OEBase by using the generic data functions OEBase.SetData and OEBase.GetData.
Minor bug fixes¶
- The memory allocation performance of multi-threaded OEChem applications on both Windows and recent Linux/UNIX distributions (that use pthreads) has been dramatically improved. A new thread hashing algorithm is now used in OESystem‘s memory pooling code which should dramatically reduce contention in allocation heavy multi-threaded code.
- A number of minor performance and numerical stability improvements have been made to OEMath‘s geometry routines.
- When parsing the command line --help -foo is no longer sensitive to the case of -foo.