OEReadCIFFile¶
bool OEReadCIFFile(oemolistream &ifs, OEMolBase &mol,
unsigned int flavor)
Reads a molecule from the specified input stream, ‘ifs’, in CIF or mmCIF file format.
The reader peeks into the file to check for specific tags to determine whether it is a CIF or
mmCIF file, given that the extension used is the same. The mmCIF reader can be enforced by
setting the OEFormat.MMCIF
on the input stream.
A number of different format variants are supported by the use of ‘flavor’ parameter from the
OEIFlavor.CIF
or the OEIFlavor.MMCIF
namespace.
This function returns true
if the operation was successful, and false
if an end-of-file was encountered.
The atoms (_atom_site) table has four “author defined alternatives” (.auth_*) that have similar meaning to the “primary” identifiers (.label_*). Two of them, atom name (atom_id) and residue name (comp_id) almost never differ. The other two, chain name (asym_id) and sequence number (seq_id) may differ in a confusing way (A,B,C <-> C,A,B). Thus we read and store only one, auth if it is present, otherwise label.
In all PDB entries each auth_asym_id “chain” is split into one or more label_asym_id subchains. The polymer (residues before the TER record in the PDB format) goes into one subchain; all the other (non-polymer) residues are put into single-residue subchains; except the waters, which are all put into one subchain. Currently, wwPDB treats non-linear polymers (such as sugars) as non-polymers.
Note
The implementation of mmCIF reading is partial, particularly for metadata processing, and is limited in scope to what is needed for Spruce TK. This means that converting to a PDB file will result in loss of header data.
It is recommended to always use the author-defined names, for consistency with the PDB format and with the literature.