LEXICHEM 2.1.0

  • Performance benchmark results: conversion of canonical isomeric smiles to names and back to the same canonical isomeric smiles. Size of the databases are given in brackets after the name.
  v2.0.2 v2.1.0
Database Round Tripping Round Tripping
Maybridge (63872) 88.94% 98.69%
MDDR (111171) 48.69% 88.54%
NCI (250251) 84.54% 92.32%
Wombat (53214) 52.80% 89.54%

New features

  • Added support for converting von Baeyer names to structures e.g. tricyclo[5.2.2.0^{3,5}]undecane is converted to :C1CC2CCC1CC3CC3C2.
  • Added basic support for a number of steroid, alkaloid and terpene parent structures.
  • Added support for L/D-amino acids.
  • Added support for R-groups for name to structure conversion.
  • Added support for both linear and branched polyspiro alicyclic hydrocarbons.
  • Activated stereochemistry support for name to structure.
  • Added a number of dictionary entries.
  • Added a number of ring templates.
  • Added partial support for von Baeyer name generation from structures.

Bug fixes

  • Added support for names: 2H-imidazol-4-thiol and 1,2-dihydroimidazole-5-imine.
  • Added support for barium(2+), sodium(1+).
  • LEXICHEM now understands trifluoroneodymium.
  • Added support for dihydrides e.g. calcium dihydride, magnesium dihydride.
  • LEXICHEM now supports multi-ammonium salts and multi-derivative ethynyl pyridines.
  • Added support for oxoarsinite based compounds.
  • Added support for a number of additional metal linking groups e.g. Mg, K, La, Dy, Er, V, Ni etc.
  • Fixed a bug in the name: 3-acetyl-8-bromo-1,2,3,6-tetrahydro-azepino[4,5-b]indole-2,5-dicarboxylic acid diethyl ester.
  • Corrected unicode conversion for: acetate, glycinate, nitrite and iodide.

LEXICHEM 2.0.2

  • Updated ring numbering templates that catch a significant fraction of ring naming failures in the NCBI’s PubChem database.
  • Updated the rules we use for naming the prefix “2-carboxyethyl” and friends, which we’d previously name “3-hydroxy-3-oxopropyl” (or similar).
  • Added parsing support for the traditional prefix “phenethoxy”.
  • Removed the insertion of a single explicit space after a semi-colon. We now prefer to preserve the original input string, rather than beautify it. This also plays nicer with HTML style input, where “λ” really doesn’t need an explicit space character after it.
  • Added several minor tweaks to AutoNom-style naming, such as “isophthalic acid” and “benzene-1,2-carbaldehyde”.
  • Updated some erroneos SMILES in the dictionary including sulfamethoxazole and tinidazole.

LEXICHEM 2.0.0

  • The applications have a new, standardized command line interface. Please have a look at the updated documentation for mol2nam, nam2mol and translate.
  • This release includes the ability to parse stereo on input names. Previously it was read and ignored.
  • Fixed a bug where, in rare cases, the output name depended on input atom ordering.
  • Fixed a crash in determining CIP stereo for very large, pathological molecules.

LEXICHEM 1.9

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 234297 structures (93.62%) to names without BLAH. Of these 234297 names, nam2mol is able to convert 231566 (98.83%) back into structures.
  • This release includes a significant number of improvements to both name generation and name parsing. Several bugs have also been fixed. The name parsing conversion rate for the 71367 compound names in the 2003 Maybridge catalog is now up to 95.24%.
  • Several improvements have been made to the specification of CIP stereochemistry during name generation. For example, previously linking groups such as amidino, carbamimidoyl and diazenyl would forget to specify E/Z descriptors if they contained a chiral double bond with specified stereochemistry. We would also fail to place some chiral prefixes such as (E)-styrl and (Z)-cinnamyl in brackets which can lead to ambiguity when interpreting the generated name.

LEXICHEM 1.8

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 234296 structures (93.62%) to names without BLAH. Of these 234296 names, nam2mol is able to convert 228102 (97.36%) back into structures.
  • This release includes a significant number of improvements to both name generation and name parsing. Several bugs have also been fixed. The name parsing conversion rate for the 71367 compound names in the 2003 Maybridge catalog is now up to 95.12%.
  • One of the major parsing improvements in this release is the much improved support for handling von Baeyer ring nomenclature. We can now parse names such as:
    • 1,4-dithioniabicyclo[2.2.2]octane‘,
    • bicyclo[4.2.0]octa-1(6),2,4-triene‘ and
    • 2,4-diazaspiro[4.4]nonane-1,3-dione‘.

LEXICHEM 1.7

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 234155 structures (93.57%) to names without BLAH. Of these 234155 names, nam2mol is able to convert 223246 (95.34%) back into structures.
  • This release includes a significant number of improvements to both name generation and name parsing. Several bugs have also been fixed. The name parsing conversion rate for the 71367 compound names in the 2003 Maybridge catalog is now up to 93.81%.
  • A new function has been added to the Lexichem toolkit API. This function converts the input chemical name to lower-case, whilst preserving the case sensitive aspects of IUPAC names. This functionality allows uppercase and mixed case names to be translated into English, as the OEFrom<Foo> functions assume their input is lowercase. For example, this feature allows AGUA to be recognized via .
  • A new function has been added to the Lexichem toolkit API. This function attempts to reorder the given permuted index name into a form that can be handled by the function. For example, this will convert the string ‘benzene, chloro-‘ into ‘chloro-benzene‘.
  • A number of improvements and bug fixes have been made to Lexichem‘s naming styles. For example, AutoNom and CAS permuted index styles are now far more AutoNom-like and CAS-like respectively. Naming of metallocenes and fullerenes is much improved.
  • Some dramatic improvements have been made with foreign language support. On the 250251 compounds in the NCI00 database mentioned above, we now round-trip 100% to German and back without any differences. Japanese, Spanish and Swedish rates are all currently above 99%. Support for Hungarian and Polish has been dramatically improved.

LEXICHEM 1.6

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 233010 structures (93.11%) to names without BLAH. Of these 233010 names, nam2mol is able to convert 221331 (94.99%) back into structures.
  • This release includes a significant number of improvements to both name generation and name parsing. For example, both name generation and parsing now do a much better job on ring fusion nomenclature, for names like ‘5,6,7,8-tetrahydro[1,2,4]triazolo[4,3-a]pyridine‘. There’s also much improved handling of charged ring systems. The name parsing conversion rate for the 71367 compound names in the 2003 Maybridge catalog is now 93.25% in v1.6, up from 80.80% in v1.5.
  • In name generation, new naming styles have been added for MDL/Beilstein AutoNom style names, for CAS permuted index style names (and there are new placeholder styles for IUPAC79 and IUPAC93 naming). A large number of improvements have been made to names generated using the ‘traditional’ naming style. A new API function is available to capitalizing the appropriate first letter of a generated name, such as ‘p-tert-Butylbenzoic acid‘.
  • Several bug fixes have been made to the Cahn-Ingold-Prelog (CIP) chirality perception implementation.
  • The function is now able return supplementary locant annotations for each atom. This function now stores an integer locant code/identifier in the integer atom type field of each atom, which may be retrieved using the method and converted into a readable/displayable string using the recently exposed function. This functionality is a recent addition (obviously), and most but not all supported ring systems and parents have locant annotations in this initial release.
  • Finally, for the adventurous, new APIs for translating compound names from foreign languages into English are available as the experimental , and functions. Additionally, a function is available for converting UTF-8 encoded strings into the escaped sequences expected by these functions (effectively the inverse of ).

LEXICHEM 1.5

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 223066 structures (89.14%) to names without BLAH. Of these 223066 names, nam2mol is able to convert 192487 (86.29%) back into structures.
  • This release includes a significant number of improvements to both name generation and name parsing. For example, nam2mol now supports more numbered locants, such as ‘N1-methylaniline‘ and for ‘Maybridge-style’ locant names such as N'1 (interpreted as the more common N1'). These and similar changes have increased the conversion rate for the 71367 compound names in the 2003 Maybridge catalog, from 69.51% in v1.4 to 80.80% in v1.5.
  • This release includes the ability to generate compound names in Japanese, and much improved Spanish and Polish naming support. In order to better support internationalization, APIs are now available to map from the default ISO-8859-1 output to either 7-bit ASCII, UTF-8, HTML and for Japanese locales, Shift-JIS or EUC-JP.
  • Although impossible in the general case, several improvements have been made to Lexichem‘s compound naming such that the assigned names are now more stable under arbitrary input ordering of atoms and bonds.

LEXICHEM 1.4

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 221254 structures (88.41%) to names without BLAH. Of these 221254 names, nam2mol is able to convert 192345 (86.93%) back into structures.
  • Lexichem v1.4 is predominantly a maintenance to provide a version of the oeiupac library that is compatible with OEChem v1.4. However, there have been a number of significant improvements to name parsing, and minor improvements to name generation since last month’s v1.3 release.
  • This release also includes the ability to generate compound names in several languages. In addition, to British spellings, Lexichem can now generate German, Italian, French, Spanish, Swedish, Dutch and Polish names. Whilst the translations for German, Italian, Swedish and Polish are quite comprehensive, those for French, Spanish and Dutch are less complete.
  • A potential ambiguity with the ring names ‘oxazole‘ and ‘thiazole‘ has also been resolved. The IUPAC documentation states that it is permissible to omit locants from Hantzsch-Widman names when the locants are consecutive, i.e.1,2,3,4-tetrazole‘ may be written as ‘tetrazole‘, and ‘1,2-oxazirene‘ is preferred as ‘oxazirene‘. Unfortunately, this conflicts with the traditional interpretations of ‘oxazole‘ as meaning ‘1,3-oxazole‘ and ‘thiazole‘ as ‘1,3-thiazole‘. Instead the traditional names ‘isoxazole‘ and ‘isothiazole‘ denote the ‘1,2-‘ forms. This ambiguity, that affected IUPAC-style (but not OpenEye-style) names, has been resolved by preserving the locants, so that the IUPAC names ‘1,2-oxazole‘, ‘1,3-oxazole‘, ‘1,2-thiazole‘ and ‘1,3-thiazole‘ are now generated for ‘isoxazole‘, ‘oxazole‘, ‘isothiazole‘ and ‘thiazole‘ respectively.

LEXICHEM 1.3

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 221205 structures (88.39%) to names without BLAH. Of these 221205 names, nam2mol is able to convert 183444 (82.93%) back into structures.
  • The major announcement of this release is the support for stereochemistry in compound naming. The CIP rules for assigning R/S descriptors to tetrahedral chiral centers, and E/Z descriptors to double bonds are used during name generation.

LEXICHEM 1.2

On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 220949 structures (88.29%) to names without BLAH. Of these 220949 names, nam2mol is able to convert 182438 (82.57%) back into structures.

LEXICHEM 1.2

On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 220949 structures (88.29%) to names without BLAH. Of these 220949 names, nam2mol is able to convert 182438 (82.57%) back into structures.

LEXICHEM 1.1

  • On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 220924 structures (88.28%) to names without BLAH. Of these 220924 names, nam2mol is able to convert 177145 (80.18%) back into structures.
  • A new API has been added so allow applications to check whether Lexichem‘s parsing and naming functionality can safely be used.

OEParseIUPACName Improvements

The Lexichem name parsing routines now handle a small number of structural abbreviations when parsing names. For example, it can now handle names like ‘3-CF3-5-NO2-benzoic acid‘. The usual improvements in name parsing, including more entries for common names in the Lexichem dictionary. Support for names containing multiple explicit hydrogen locants, such as ‘pyrimidine-2,4(1H,3H)-dione‘ and ‘2,4(1H,3H)-pyrimidinedione‘.

OECreateIUPACName Improvements

A serious bug that could cause a core dump when naming thioperoxoic acids has been fixed. The performance of compound naming has been improved. The usual improvements in the names generated (following the IUPAC standards more closely).

LEXICHEM 1.0

On a benchmark of 250251 compounds in the NCI00 database, mol2nam is able to convert 220922 structures (88.28%) to names without BLAH. Of these 220922 names, nam2mol is able to convert 177032 (80.13%) back into structures.

OEParseIUPACName Improvements

In addition to a great many other improvements to the name parsing code, the Lexichem parser now contains an internal dictionary allowing the recognition of common non-systematic names, such as ‘ranitidine’ and ‘zantac’.

OECreateIUPACName Improvements

In addition to a great many improvements to the name generation code, the Lexichem naming functionality now allows the specification of a naming style, allowing the compound to be named in a either a traditional, OpenEye, IUPAC, CAS or systematic naming style.