Version 1.7.0¶
OEChem 1.7.0¶
New features¶
Canonical isomeric SMILES generation has been significantly improved (
OECreateIsoSmiString
). On a test set of 9,962,003 compounds (4,025,817 with atom or bond stereo) OEChem 1.6.1 would generate different canonical isomeric smiles for 135,985 of the compounds based on random reordering of the atoms. This failure rate has been reduced to just 78 compounds, a 99.94% improvement. Furthermore, the generation has been optimized so that it is roughly 10-30% faster than the OEChem 1.6.1 algorithm.OEReadMDLQueryFile
has been added to read MDL query files into the OEQMol object. This allows for easy integration of MDL query files with the swath of OEChem tools based upon query molecules. The MDL query based substructure search was tested on a set of 655 query files.See also
The Substructure Search with MDL Queries chapter in the OEChem theory manual.
OEChem also supports MDL reaction based library generation. Reaction file can be imported into a OEQMol object by calling
OEReadMDLReactionQueryFile
function. The OELibraryGen object then can be initialized with the imported reaction. The library generation was extensive tested on a set of 160 diverse reactions.See also
The MDL Reaction Query File section in the OEChem theory manual.
The
MiniMol
implementation of OEGraphMol has been made more robust and optimized significantly for both speed and size. There used to be an arbitrary 1000 atom and bond limit, the limit is now a lot higher, \(2^{15}\). This implementation requiresCompress
be called on the molecule after construction to maximize space efficiency, but does not requireUnCompress
be called on it before it can be used. This makes it an ideal molecule implementation for in-memory substructure searching.Added oemolithread and oemolothread for threaded molecule I/O.
OEReadMolecule
andOEWriteMolecule
are thread-safe on oemolithread and oemolothread respectively.See also
The Input and Output Threads section in the OEChem theory manual.
Added implementation of Zap 9 Radii from [Nicholls-2008] through the
OEAssignZap9Radii
function.Added
OEShortestPath
function since it is an often-asked-for algorithm.Added OEIdxSelected predicate for easy sub-setting of a molecule using an array of
bool
indexed by indices.Added
OECount
function for easily counting atoms or bonds in a molecule based on arbitrary predicates.
Major bug fixes¶
Fixed seg fault in SMARTS parsing when the SMARTS contained the [<atomic mass>H<charge>] combination.
Fixed
invalid address alignment
crash in the sketch file reader on Sparc.SD tag names were limited to 75 characters. This release raises the limit to 4096.
Several major changes are made in the library generation process in order to ensure that products are generated with a valid Kekulé form. If there is an explicit hydrogen in the product side of the input reaction, then OELibraryGen will add hydrogens to the generated products accordingly and the first Kekulization will be based on this reaction specification. If it is unsuccessful, i.e.
OEKekulize
returnsfalse
, then alternatives are tried by adding and removing implicit hydrogens from specific atoms until a valid (but arbitrary) Kekulé form is identified.See also
The Product Kekulization section in the OEChem theory manual.
Significant number of improvements were added to OEChem’s PDB file parser (
OEReadPDBFile
) in the area of atomic number determination. By default PDB atomic symbol field is used to determine the atom type. However, for a subset of “known” residues, including all of the amino and nucleic acids, we continue to use the atom name heuristics, which for this subset are more reliable. Currently, there are no conflicts/discrepancies between the PDB atomic symbol and the atomic number we perceive in the PDB file reader in the entire wwPDB repository.The refinement of the hydrogen placement method (
OESet3DHydrogenGeom
) improved numerical stability and increased speed for most simple placement operations, special heuristics for carboxylic acids, toluene-like methyl rotors attached to aromatic rings, fleeing heuristics for alkanes and perpendicular support for allenic systems. Additionally, hydroxyl rotors are placed using a quick local scan for strong acceptors (alpha acceptors) within a 3.0 Angstrom radius. More hydrogen bond lengths data specifically to As, Ge, Se and Te was also added ([Sutton-1958]).
Minor bug fixes¶
OEGraphMol.operator=
does the same thing as the OEGraphMol copy constructor. This allows OEGraphMol to be used directly inside STL containers without losing the selected OEMolBase implementation. Previous versions of OEChem would change the molecule implementation unexpectedly if the STL container needed to relocate the object in memory. However, the following code changes meaning with the switch from 1.6.1 to 1.7.0:OEGraphMol gm1; OEGraphMol gm2(OEMolBaseType::OEDBMol); gm1 = gm2; // gm1 is now an OEDBMol implementation, // in 1.6.1 it would be the default implementation
The
OEMolBaseType.OEDBMol
implementation can now copy construct in compressed mode.OEReadMolecule
would sometimes returntrue
when an empty molecule was present in the input file. The high levelOEReadMolecule
function will never returntrue
when the molecule does not contain any atoms. Low-level molecule routines should be used if empty molecules are desired.Stabilized the atom output order from successive calls to
OEWriteMolecule
to MOL2 when the molecule contained residues.Deprecated the non-OE prefixed
SmartsLexReplace
function name. Renamed toOESmartsLexReplace
.Fixed the bug in substructure search that occurred when OESubSearch was initialized with a SMARTS string starting with a hydrogen atom (such as
[#1]O[C,N,S,P]=O
). The order of the atoms were reordered even when allowReorder parameter was set to befalse
.When initializing an OESubSearch object with a SMARTS pattern the ‘reorder’ parameter is changed to
false
by default. This parameter is currently ignored i.e the atom order in the returned matches is always identical of the atom order in the SMARTS pattern.Fixed a bug in
OEGetAromatic
andOEGetBondOrder
to retrieve aromaticity/bond order from query expressions.A restriction is added to the interpretation of SMIRKS in ‘strict’ mode. This requires that all atom maps in the reaction have to be pairwise when OELibraryGen is initialized with a SMIRKS string or OEQMolBase object.
DBREF, SEQADV, MTRIX1, MTRIX2 and MTRIX3 pdb data lines are now kept when parsing the file. The data can be accessed by
OEGetPDBData
.Fixed the bug in
OEWritePDBFile
that caused writing the MODEL number into a wrong position in a pdb file.Fixed the bug in the SMILES canonicalization process for special cases when the input SMILES contains R-Group information (such as
[R1]c1ccc(cc1c2cccc(c2)[R3]
).Tweaked the OpenEye charge model such that the three valent Beryllium has a negative one charge. This makes
F[Be-](F)F
equivalent to the charge separated form[Be+2].[F-].[F-].[F-]
fixing 61 ligands in the PDB data set.Improved the bond order perception (
OEPerceiveBondOrders
) support:for arsenic acids, including the ‘cacodylate’ ion (example in 3DUE pdb).
for uric acid and related heterocycles.
for azobenzenes and similar compounds. All acyclic ‘nitrogen(2)-nitrogen(2)’ bonds now undergo a strict distance check independent of the (single) bond angle at each end. This corrects 1SRE, 1SRF and 2GBY pdb entries.
for benzoquinones and anthraquinones.
Addressed the problem in
OEAssignAromaticFlags
that caused long processing times when reading (OEReadMDLFile
) on some pathological pseudo-fullerenes.Added PDB support for the following:
sidechain recognition for the RNA residue ‘YG’ and ‘H2U’
naming of PDB residue ‘BME’
the N-terminal modification ‘FOR’
the cofactor ‘FMT’ (which is “formic acid” or “formate”)
The following problems were fixed in the PDB file parser (
OEReadPDBFile
):‘anomalous mercury’ problem for the residue ‘DVA’ (example in 2IZQ pdb)
spurious Holmium problem (in residues ‘CEH’ and ‘NGR’)
naming of the ‘ P ‘, ‘ O1P’, ‘ O2P’ and ‘ O3P’ atoms in the nonstandard PDB residues ‘PTR’, ‘SEP’ and ‘TPO’.
Improved the initial partial charge parameterization (
OEMMFF94InitialCharges
) for selenium (atom type 83).Improved the perception of reactions in ISIS Sketch files. Most importantly we now support the ‘rxnarrow’ object generated by recent versions of ISIS, such as MDL Draw and Symyx Draw. We also now allow the sketch to contain multiple lines, provided that only one has an arrow, and to allow the arrow direction and arrow coordinates to be specified in arbitrary order.
OEGrid 1.3.2¶
New features¶
The user is no longer required to call
OEInitGridHandlers
in order to attach grids to molecules and then write them out to OEB. This occurs at library link time now.See also
The Generic Data section in the Grids chapter for details of attaching grids to molecules.
Major bug fixes¶
OEReadGrid
would crash on gzipped files. It now properly uncompresses the data before reading it into a grid.
OESystem 1.7.0¶
New features¶
OERandom can now produce random integers through the
OERandom.NextInt
method.Added an OEBitVector constructor which takes an OERandom for generating random bit strings. Added
OEBitVector.operator<
method so OEBitVector can be used in STL containers.Added OEBoundedBuffer and OEProtectedBuffer objects useful for communicating between threads in multi-threaded applications.
Added OEWallTimer since OEStopwatch actually reports CPU time so if multiple threads are being used it reports the added CPU time from each thread. Also added OECycleTimer for doing high precision timing based on clock cycles. For example, on x86 this uses the
rdtsc
instruction and is thus susceptible to its trade offs in comparison to using CPU time from OEStopwatch.Added
convenience constructor
to OEInterface for the most common use case. Where a commandline needs to be parsed relative to a particular interface definition.
Major bug fixes¶
Deprecated
OESetThreadSafe
as it only gave the illusion of thread safety to the OEChem toolkits. UseOESetMemPoolMode
instead as outlined in the Memory Management section. Most users should just ignore this issue altogether, as the defaults are sufficient (and have been optimized in 1.7.0). The user should only consider callingOESetMemPoolMode
when passing OEChem objects between threads.Fixed regression where
OEBitVector.FromHexString
would no longer recognize hex strings in lowercase. It should also be noted thatOEBitVector.ToHexString
encodes the fractional length of the last 4 bits as the last character.
Minor bug fixes¶
Added
OEErrorHandler.Debug
as another output option to matchOEErrorLevel.Debug
OEErrorHandler now properly does nothing when
OEErrorLevel.Quiet
is passed.Added OEUnaryTrue and OEBinaryTrue to match their inverses that already existed.
OEPlatform 1.7.0¶
New features¶
Added OELock class to do scoped locking around OEMutex objects. This ensures mutexes are released in the event of stack unwinding, such as when exceptions are thrown.
Added the following cross-platform threading primitives: OEThread, OEThreadLocal, OECondition, and OEOnce.
Added
OEGetTimeOfDay
to wrap gettimeofday on posix, but provide our own implementation on Windows since it doesn’t exist on Windows.Added
OEGetNumProcessors
to return the number of cores available on the system.
Major bug fixes¶
Fixed rare off by one buffer overflow in oeogzstream initialization.
Fixed
seek
andtell
on files greater than 4 gigabytes on Windows.
Minor bug fixes¶
The OEMutex implementation would default to a no-op implementation even when a valid mutex implementation was available through pthreads and compiling with a non-GCC compiler.
oeifstream will return
false
whenoestream.seek
is called and return0
whenoestream.size
is called when the stream points tostdin
.Corrected instances where streams were using
oefpos_t
for memory operations andoesize_t
for file operations. The rule is memory operations should useoesize_t
(e.g.oeistream.read
), and file operations should useoefpos_t
(e.g.oestream.seek
).Changing
unsigned int
arguments tooesize_t
for oeisstream to be able to handle memory larger than 4 gigabytes on 64-bit machines.The following functions were global scope:
OEGetIPAddress
,OEGetHostIdent
,OEGetDomainName
, andOEGetHostName
. They are now located in the OEPlatform namespace.