OEFromBritish function has been added for completeness.
The name perception code already handled British-style input names
without this conversion function; however, it is useful to have the
function available for use by higher-level APIs that dispatch based
on input language.
Handling of character sets for non-English language names has been improved.
The internal string representation of all non-ASCII characters is now
their \u escaped Unicode codepoints. Previously, Latin-1 characters
(e.g., the output of the OEToFrench) were encoded
as their Latin-1 bytes (0x80 - 0xff). Now they are encoded as
escaped unicode. This eliminates character encoding problems for the Java
and Python language wrappers that occurred when intermediate
name strings were being processed. Now, only the following final character set encoding/decoding
functions need to handle non-ASCII bytes:
Name-to-structure conversions for certain uncommon functional groups
(nitramide, nitrile oxide, oxycyano) previously resulted in structures with 5-valent
nitrogen. These now result in charge-separated N(III) structures, consistent
with other nitrogen functional groups.
The documentation has been updated to indicate the range of Unicode
codepoints that are handled for each of the language translation
functions. This allows users to determine the appropriate output
character set conversion.
For example, the Chinese language translation results in CJK characters
in the Unicode Basic Multilingual Plane (BMP). Therefore, it is only
appropriate to convert Chinese language names to UTF8 or HTML, which
can handle the full Unicode BMP. Since the French language translation results
in ASCII and Latin-1 characters, it is reasonable to use Latin1, UTF8, or
HTML as the output character set.
The example programs have been updated to better illustrate character set