This is a preliminary API and may be improved based on user feedback. It is currently available in C++ and Python.

bool OEReadCIFFile(oemolistream &ifs, OEMolBase &mol,
                   unsigned int flavor)

Reads a molecule from the specified input stream, ‘ifs’, in CIF or mmCIF file format. The reader peeks into the file to check for specific tags to determine whether it is a CIF or mmCIF file, given that the extension used is the same. The mmCIF reader can be enforced by setting the OEFormat_MMCIF on the input stream. A number of different format variants are supported by the use of ‘flavor’ parameter from the OEIFlavor_CIF or the OEIFlavor_MMCIF namespace. This function returns true if the operation was successful, and false if an end-of-file was encountered.

The atoms (_atom_site) table has four “author defined alternatives” (.auth_*) that have similar meaning to the “primary” identifiers (.label_*). Two of them, atom name (atom_id) and residue name (comp_id) almost never differ. The other two, chain name (asym_id) and sequence number (seq_id) may differ in a confusing way (A,B,C <-> C,A,B). Thus we read and store only one, auth if it is present, otherwise label.

In all PDB entries each auth_asym_id “chain” is split into one or more label_asym_id subchains. The polymer (residues before the TER record in the PDB format) goes into one subchain; all the other (non-polymer) residues are put into single-residue subchains; except the waters, which are all put into one subchain. Currently, wwPDB treats non-linear polymers (such as sugars) as non-polymers.


The implementation of mmCIF reading is partial, particularly for metadata processing, and is limited in scope to what is needed for Spruce TK. This means that converting to a PDB file will result in loss of header data.

It is recommended to always use the author-defined names, for consistency with the PDB format and with the literature.