OEReadCIFFile

bool OEReadCIFFile(oemolistream &ifs, OEMolBase &mol,
                   unsigned int flavor)

Reads a molecule from the specified input stream, ‘ifs’, in CIF file format. THis supports the small molecule smCIF, macromolecule mmCIF, and chemical component ccCIF format variants. The reader peeks into the file to check for specific tags to determine whether it is a smCIF, ccCIF or mmCIF formatted file, given that the extension used is the same. The mmCIF reader can be enforced by setting the OEFormat_MMCIF on the input stream. A number of different format variants are supported by the use of ‘flavor’ parameter from the OEIFlavor_CIF or the OEIFlavor_MMCIF namespace. This function returns true if the operation was successful, and false if an end-of-file was encountered.

The atoms (_atom_site) table has four “author defined alternatives” (.auth_*) that have similar meaning to the “primary” identifiers (.label_*). Two of them, atom name (atom_id) and residue name (comp_id) almost never differ. The other two, chain name (asym_id) and sequence number (seq_id) may differ in a confusing way (A,B,C <-> C,A,B). Thus we read and store only one, auth if it is present, otherwise label.

In all PDB entries each auth_asym_id “chain” is split into one or more label_asym_id subchains. The polymer (residues before the TER record in the PDB format) goes into one subchain; all the other (non-polymer) residues are put into single-residue subchains; except the waters, which are all put into one subchain. Currently, wwPDB treats non-linear polymers (such as sugars) as non-polymers.

Note

It is recommended to always use the author-defined names, for consistency with the PDB format and with the literature.