Version 1.7.0

OEChem 1.7.0

New features

  • Canonical isomeric SMILES generation has been significantly improved (OECreateIsoSmiString). On a test set of 9,962,003 compounds (4,025,817 with atom or bond stereo) OEChem 1.6.1 would generate different canonical isomeric smiles for 135,985 of the compounds based on random reordering of the atoms. This failure rate has been reduced to just 78 compounds, a 99.94% improvement. Furthermore, the generation has been optimized so that it is roughly 10-30% faster than the OEChem 1.6.1 algorithm.

  • OEReadMDLQueryFile has been added to read MDL query files into the OEQMol object. This allows for easy integration of MDL query files with the swath of OEChem tools based upon query molecules. The MDL query based substructure search was tested on a set of 655 query files.

    See also

    The Substructure Search with MDL Queries chapter in the OEChem theory manual.

  • OEChem also supports MDL reaction based library generation. Reaction file can be imported into a OEQMol object by calling OEReadMDLReactionQueryFile function. The OELibraryGen object then can be initialized with the imported reaction. The library generation was extensive tested on a set of 160 diverse reactions.

    See also

    The MDL Reaction Query File section in the OEChem theory manual.

  • The MiniMol implementation of OEGraphMol has been made more robust and optimized significantly for both speed and size. There used to be an arbitrary 1000 atom and bond limit, the limit is now a lot higher, \(2^{15}\). This implementation requires Compress be called on the molecule after construction to maximize space efficiency, but does not require UnCompress be called on it before it can be used. This makes it an ideal molecule implementation for in-memory substructure searching.

  • Added oemolithread and oemolothread for threaded molecule I/O. OEReadMolecule and OEWriteMolecule are thread-safe on oemolithread and oemolothread respectively.

    See also

    The Input and Output Threads section in the OEChem theory manual.

  • Added implementation of Zap 9 Radii from [Nicholls-2008] through the OEAssignZap9Radii function.

  • Added OEShortestPath function since it is an often-asked-for algorithm.

  • Added OEIdxSelected predicate for easy sub-setting of a molecule using an array of bool indexed by indices.

  • Added OECount function for easily counting atoms or bonds in a molecule based on arbitrary predicates.

Major bug fixes

  • Fixed seg fault in SMARTS parsing when the SMARTS contained the [<atomic mass>H<charge>] combination.

  • Fixed invalid address alignment crash in the sketch file reader on Sparc.

  • SD tag names were limited to 75 characters. This release raises the limit to 4096.

  • Several major changes are made in the library generation process in order to ensure that products are generated with a valid Kekulé form. If there is an explicit hydrogen in the product side of the input reaction, then OELibraryGen will add hydrogens to the generated products accordingly and the first Kekulization will be based on this reaction specification. If it is unsuccessful, i.e. OEKekulize returns false, then alternatives are tried by adding and removing implicit hydrogens from specific atoms until a valid (but arbitrary) Kekulé form is identified.

    See also

    The Product Kekulization section in the OEChem theory manual.

  • Significant number of improvements were added to OEChem’s PDB file parser (OEReadPDBFile) in the area of atomic number determination. By default PDB atomic symbol field is used to determine the atom type. However, for a subset of “known” residues, including all of the amino and nucleic acids, we continue to use the atom name heuristics, which for this subset are more reliable. Currently, there are no conflicts/discrepancies between the PDB atomic symbol and the atomic number we perceive in the PDB file reader in the entire wwPDB repository.

  • The refinement of the hydrogen placement method (OESet3DHydrogenGeom) improved numerical stability and increased speed for most simple placement operations, special heuristics for carboxylic acids, toluene-like methyl rotors attached to aromatic rings, fleeing heuristics for alkanes and perpendicular support for allenic systems. Additionally, hydroxyl rotors are placed using a quick local scan for strong acceptors (alpha acceptors) within a 3.0 Angstrom radius. More hydrogen bond lengths data specifically to As, Ge, Se and Te was also added ([Sutton-1958]).

Minor bug fixes

  • OEGraphMol.operator= does the same thing as the OEGraphMol copy constructor. This allows OEGraphMol to be used directly inside STL containers without losing the selected OEMolBase implementation. Previous versions of OEChem would change the molecule implementation unexpectedly if the STL container needed to relocate the object in memory. However, the following code changes meaning with the switch from 1.6.1 to 1.7.0:

    OEGraphMol gm1;
    OEGraphMol gm2(OEMolBaseType::OEDBMol);
    gm1 = gm2;
    // gm1 is now an OEDBMol implementation,
    // in 1.6.1 it would be the default implementation
    
  • The OEMolBaseType.OEDBMol implementation can now copy construct in compressed mode.

  • OEReadMolecule would sometimes return true when an empty molecule was present in the input file. The high level OEReadMolecule function will never return true when the molecule does not contain any atoms. Low-level molecule routines should be used if empty molecules are desired.

  • Stabilized the atom output order from successive calls to OEWriteMolecule to MOL2 when the molecule contained residues.

  • Deprecated the non-OE prefixed SmartsLexReplace function name. Renamed to OESmartsLexReplace.

  • Fixed the bug in substructure search that occurred when OESubSearch was initialized with a SMARTS string starting with a hydrogen atom (such as [#1]O[C,N,S,P]=O). The order of the atoms were reordered even when allowReorder parameter was set to be false.

  • When initializing an OESubSearch object with a SMARTS pattern the ‘reorder’ parameter is changed to false by default. This parameter is currently ignored i.e the atom order in the returned matches is always identical of the atom order in the SMARTS pattern.

  • Fixed a bug in OEGetAromatic and OEGetBondOrder to retrieve aromaticity/bond order from query expressions.

  • A restriction is added to the interpretation of SMIRKS in ‘strict’ mode. This requires that all atom maps in the reaction have to be pairwise when OELibraryGen is initialized with a SMIRKS string or OEQMolBase object.

  • DBREF, SEQADV, MTRIX1, MTRIX2 and MTRIX3 pdb data lines are now kept when parsing the file. The data can be accessed by OEGetPDBData.

  • Fixed the bug in OEWritePDBFile that caused writing the MODEL number into a wrong position in a pdb file.

  • Fixed the bug in the SMILES canonicalization process for special cases when the input SMILES contains R-Group information (such as [R1]c1ccc(cc1c2cccc(c2)[R3]).

  • Tweaked the OpenEye charge model such that the three valent Beryllium has a negative one charge. This makes F[Be-](F)F equivalent to the charge separated form [Be+2].[F-].[F-].[F-] fixing 61 ligands in the PDB data set.

  • Improved the bond order perception (OEPerceiveBondOrders) support:

    • for arsenic acids, including the ‘cacodylate’ ion (example in 3DUE pdb).

    • for uric acid and related heterocycles.

    • for azobenzenes and similar compounds. All acyclic ‘nitrogen(2)-nitrogen(2)’ bonds now undergo a strict distance check independent of the (single) bond angle at each end. This corrects 1SRE, 1SRF and 2GBY pdb entries.

    • for benzoquinones and anthraquinones.

  • Addressed the problem in OEAssignAromaticFlags that caused long processing times when reading (OEReadMDLFile) on some pathological pseudo-fullerenes.

  • Added PDB support for the following:

    • sidechain recognition for the RNA residue ‘YG’ and ‘H2U’

    • naming of PDB residue ‘BME’

    • the N-terminal modification ‘FOR’

    • the cofactor ‘FMT’ (which is “formic acid” or “formate”)

  • The following problems were fixed in the PDB file parser (OEReadPDBFile):

    • ‘anomalous mercury’ problem for the residue ‘DVA’ (example in 2IZQ pdb)

    • spurious Holmium problem (in residues ‘CEH’ and ‘NGR’)

    • naming of the ‘ P ‘, ‘ O1P’, ‘ O2P’ and ‘ O3P’ atoms in the nonstandard PDB residues ‘PTR’, ‘SEP’ and ‘TPO’.

  • Improved the initial partial charge parameterization (OEMMFF94InitialCharges) for selenium (atom type 83).

  • Improved the perception of reactions in ISIS Sketch files. Most importantly we now support the ‘rxnarrow’ object generated by recent versions of ISIS, such as MDL Draw and Symyx Draw. We also now allow the sketch to contain multiple lines, provided that only one has an arrow, and to allow the arrow direction and arrow coordinates to be specified in arbitrary order.

OEGrid 1.3.2

New features

  • The user is no longer required to call OEInitGridHandlers in order to attach grids to molecules and then write them out to OEB. This occurs at library link time now.

    See also

    The Generic Data section in the Grids chapter for details of attaching grids to molecules.

Major bug fixes

  • OEReadGrid would crash on gzipped files. It now properly uncompresses the data before reading it into a grid.

OESystem 1.7.0

New features

Major bug fixes

  • Deprecated OESetThreadSafe as it only gave the illusion of thread safety to the OEChem toolkits. Use OESetMemPoolMode instead as outlined in the Memory Management section. Most users should just ignore this issue altogether, as the defaults are sufficient (and have been optimized in 1.7.0). The user should only consider calling OESetMemPoolMode when passing OEChem objects between threads.

  • Fixed regression where OEBitVector.FromHexString would no longer recognize hex strings in lowercase. It should also be noted that OEBitVector.ToHexString encodes the fractional length of the last 4 bits as the last character.

Minor bug fixes

OEPlatform 1.7.0

New features

  • Added OELock class to do scoped locking around OEMutex objects. This ensures mutexes are released in the event of stack unwinding, such as when exceptions are thrown.

  • Added the following cross-platform threading primitives: OEThread, OEThreadLocal, OECondition, and OEOnce.

  • Added OEGetTimeOfDay to wrap gettimeofday on posix, but provide our own implementation on Windows since it doesn’t exist on Windows.

  • Added OEGetNumProcessors to return the number of cores available on the system.

Major bug fixes

  • Fixed rare off by one buffer overflow in oeogzstream initialization.

  • Fixed seek and tell on files greater than 4 gigabytes on Windows.

Minor bug fixes

  • The OEMutex implementation would default to a no-op implementation even when a valid mutex implementation was available through pthreads and compiling with a non-GCC compiler.

  • oeifstream will return false when oestream.seek is called and return 0 when oestream.size is called when the stream points to stdin.

  • Corrected instances where streams were using oefpos_t for memory operations and oesize_t for file operations. The rule is memory operations should use oesize_t (e.g. oeistream.read), and file operations should use oefpos_t (e.g. oestream.seek).

  • Changing unsigned int arguments to oesize_t for oeisstream to be able to handle memory larger than 4 gigabytes on 64-bit machines.

  • The following functions were global scope: OEGetIPAddress, OEGetHostIdent, OEGetDomainName, and OEGetHostName. They are now located in the OEPlatform namespace.