Several enhancements have been made to the protein perception
algorithms used in OEPerceiveResidues. These allow OEChem to
recognize the N-terminal capping group ACE, and the non-standard
amino acid residues ABA, CGU, CME, CSD, MLY, MSE,
PCA, PTR, SEP and TPO. Support for these additional amino
acid types has also been added to OEGetResidueIndex and friends.
The sidechain pattern matching algorithm now has improved fallback
functionality for better handling of modified/substituted residues.
Improved support from aromatic boron and aromatic silicon
in OEKekulize. The OEChem toolkit currently doesn’t perceive
either boron or silicon to be aromatic (with any aromaticity model),
but this enhancement allows us to Kekulize structures so specified.
Added improved support of parsing SMILES containing aromatic boron
and aromatic silicon, allowing the OEChem toolkit to parse b1ccccc1
A new OEGetDelphiRadius function has been added to OEChem
to return the default radius for a given element used by the Accelrys’
Delphi program for electrostatics calculations.
A new function OEGetAminoAcidCode can be used to convert
an index from the OEResidueIndex namespace to a IUMB single
character code (A for alanine, R for arginine, etc...).
A new function OEIsBinary is provided to determine
whether the specified file format is binary or not, for example,
.oeb, .bin and .cdx.
The new function OEGetFormatExtension can be used
to return a comma separated list of lowercase file format extensions
that can be used to aid implementing directory scans and file
format dialog boxes.
A problem in OEChem‘s graph canonicalization algorithm was
identified by the NCBI’s PubChem project for the single molecule:
C12C3C4C3C5C4C1C25. This problem has been fixed in OEChem
1.3.3. Unfortunately, this failure didn’t show up on our testing
of 100 random permutations of 2.5 million compound test set.
Efforts are now on-going to validate OpenEye‘s canonicalization
against all theoretical connection tables with less than \(N\) atoms,
for some \(N > 10\).
A bug in the OEB file format readers and writers that could
cause the titles and/or comments attached to molecules or conformers
to be lost, has been corrected.
Fixed bug in the OEChem SMARTS parser that failed to follow the
Daylight semantics for patterns such as [H], [2H] and [H+]
where the H specifies the pattern must match a hydrogen, and not
the expected hydrogen count on an atom.
The OEChem SMILES writers have been modified to prevent them
generating atoms such as [C@H2] or [C@@H2] for centers that
have stereo explicitly specified (on non-chiral centers) with
explicit hydrogens, when the hydrogens are being automatically
suppressed by the output SMILES flavor.
The old-style OE binary, .bin, file format reader now
automatically sets the dimension property of molecules and conformers
to 3. Whilst new-style OE binary, .oeb, files explicitly record
the dimensionality of the stored coordinates, the old format didn’t
and its contents should be assumed to be 3-dimensional.
Fixed a problem in the SMILES parser, which would cause a
segmentation fault if ever a SMILES string longer than 4096
characters encountered a syntax or Kekulization error. We no
longer try to report the location of the syntax error for SMILES
strings longer than 2048 characters.
A bug in OEPerceiveBondOrders that assumed/required that
the incoming molecule not have any aromaticity specified, has been
fixed by calling OEClearAromaticFlags on the incoming molecule.
This assumption was valid for its existing use by the high-level
file format readers, but meant that calling OEPerceiveBondOrders
twice in a row could sometimes produce different results.
Fixed a potential problem in several file format readers that
caused a run-time abort in Microsoft’s runtime libraries on Windows
when reading corrupt or binary files. The Microsoft implementation of
the standard <ctype.h> functions, such as isdigit and
isupper will abort when passed negative values, such as
when interpreting the bytes of a file as (signed) char.
Fixed a bug in OEMDLCorrectBondStereo that could cause
that routine to crash, if the chiral atom on which the stereo
chemistry needed to be corrected was degree three instead of
degree 4. This routine has been made more robust, and can now
correct wedges and hashes around degree three atoms that conflict
with the specified MDL parity bit.
The OEChem MDL mol file reader has been made more robust by
checking for negative values in the atom count, bond count and list
count fields. These are now interpreted as being zero. Corrupted
SD files could previously cause OEChem to crash.
The OEChem SMILES parser, OEParseSmiles function, has
been fixed to set the default bond order of unspecified external
bonds, i.e.C&1, to be single. Previously these were left
initialized as bond order zero, although C&=1 and C were
correctly handled as double and triple bonds respectively.
The function OEPDBOrderAtoms has been improved to only
compare atoms names for recognized residues when sorting. This
prevents atoms being needlessly reordered for no good reason.
OEPerceiveResidues has been improved to assign unique
atom names to every atom within an unknown or unrecognized residue.
Previously, all six atoms in benzene would be given the same atom
name C``whichconfusessoftwarethatassumesPDBatomnamesareuniquewithinaresidue.*OEChem*nowassigns``C1, C2, etc...
Add goof-proofing to return calls to OEInvertCenter
where the specified atom is not trivially invertible (i.e. a
center with 3 or more ring bonds).
Improved handling of the hydrogen isotopes D and T when
reading MDL connection tables. These symbols now automatically
set the isotope field appropriately. Previous versions of OEChem
interpreted these symbols as forms of hydrogen, but relied on the
MDL’s mass field or MISO line being correctly set to specify
A very minor bug in OEPerceiveResidues has been fixed
that prevented residue information from being assigned to lone protons.
The algorithm previously assumed all hydrogens were bonded to a
heavy atom parent.
In OESubsetMol the dummy atoms used to represent attachment
points are no assigned map indices starting from one, i.e. R1, R2, R3,
instead of from zero, i.e., R1, R2.
OESubsetMol now attempts to preserve or undefine the
specified stereochemistry at atoms and bonds affected by attachment
The performance of OEDetermineConnectivity has been
dramatically improved for very large molecules. This greatly speeds
up the reading of proteins like pdb1jj2.ent (which contains 98,543 atoms)
Replaced an inefficient \(O(n^2)\) algorithm in the
OEChem::OEMolBaseImpl::OrderAtoms method that checked that the
input vector was a valid permutation of a subset of the atoms
in the molecule. This dramatically improves the performance
of writing large PDB files.
The OEInterface class and associated machinery for creating
and parsing command lines is now available in Python. While Python has
native command line argument support, this provides an alternative that
is functionally similar to the C++ OEChem version. The example program
molextract.py has been updated to demonstrate this new feature.
Fixed a bug in PyAtomPredicate, PyBondPredicate and
PyConfPredicate where a syntax error in the Python callable function
would silently fail. Now, if there is an error in the Python function,
the exception will propagate back to the Python interpreter.
By default the OpenEye toolkits now use thread-safe memory
management internally to allow multiple molecules (and other objects)
to be manipulated by different concurrent threads. Modifying the
same object concurrently is still unsafe. On some operating systems,
OEChem intensive applications may experience a slight overhead which
may be explicitly disabled with the new OESetThreadSafe function
call. Timings on modern GNU/Linux systems show almost no overhead,
and the performance benefits of upgrading to g++ 3.4.x means that most
applications should run faster with OEChem 1.3.3 than with previous
releases even with thread-safety enabled.
The --help functionality of the OEInterface class
has been improved to indent and wrap the on-line help text at 80 columns.
The default screen width can be controlled by specifying the column width
on the command line, for example --helpall100.
The OEInterface parser has been improved to allow
!CATEGORY names to be quoted, allowing names to contain spaces.
The OESystem::OEFizzGrid class now has an
method, which returns true if either floats or integers have been set.
The semantics of how quaternions are represented within the
OpenEye toolkits have now been standardized, as scalar-first. Hence,
of the four floating point values that define a quaternion,
the first represents the scalar component and the final three values
represent the vector component. The failure to explicitly document
which of the two possible forms was used, resulted in some OEMath
functions assuming scalar-first whilst others assumed scalar-last.
(The quaternion functions in OELib, for example, used scalar-last).
Functions affected by this include OEMath::OEGeomQuaternionMultiply,