This is a preliminary API and may be improved based on user feedback. It is currently available in C++ and Python.

bool OEReadCIFFile(oemolistream &ifs, OEMolBase &mol,
                   unsigned int flavor)

Reads a molecule from the specified input stream, ‘ifs’, in CIF or mmCIF file format. The reader peeks into the file to check for specific tags to determine whether it is a CIF or mmCIF file, given that the extension used is the same. The mmCIF reader can be enforced by setting the OEFormat.MMCIF on the input stream. A number of different format variants are supported by the use of ‘flavor’ parameter from the OEIFlavor.CIF or the OEIFlavor.MMCIF namespace. This function returns true if the operation was successful, and false if an end-of-file was encountered.

The atoms (_atom_site) table has four “author defined alternatives” (.auth_*) that have similar meaning to the “primary” identifiers (.label_*). Two of them, atom name (atom_id) and residue name (comp_id) almost never differ. The other two, chain name (asym_id) and sequence number (seq_id) may differ in a confusing way (A,B,C <-> C,A,B). Thus we read and store only one, auth if it is present, otherwise label.

In all PDB entries each auth_asym_id “chain” is split into one or more label_asym_id subchains. The polymer (residues before the TER record in the PDB format) goes into one subchain; all the other (non-polymer) residues are put into single-residue subchains; except the waters, which are all put into one subchain. Currently, wwPDB treats non-linear polymers (such as sugars) as non-polymers.


The implementation of mmCIF reading is partial, particularly for metadata processing, and is limited in scope to what is needed for Spruce TK. This means that converting to a PDB file will result in loss of header data.

It is recommended to always use the author-defined names, for consistency with the PDB format and with the literature.