OEReadCIFFile¶
bool OEReadCIFFile(oemolistream &ifs, OEMolBase &mol,
unsigned int flavor)
Reads a molecule from the specified input stream, ‘ifs’, in CIF or mmCIF file format.
The reader peeks into the file to check for specific tags to determine whether it is a CIF or
mmCIF file, given that the extension used is the same. The mmCIF reader can be enforced by
setting the OEFormat::MMCIF
on the input stream.
A number of different format variants are supported by the use of ‘flavor’ parameter from the
OEIFlavor::CIF
or the OEIFlavor::MMCIF
namespace.
This function returns true
if the operation was successful, and false
if an end-of-file was encountered.
The atoms (_atom_site) table has four “author defined alternatives” (.auth_*) that have similar meaning to the “primary” identifiers (.label_*). Two of them, atom name (atom_id) and residue name (comp_id) almost never differ. The other two, chain name (asym_id) and sequence number (seq_id) may differ in a confusing way (A,B,C <-> C,A,B). Thus we read and store only one, auth if it is present, otherwise label.
In all PDB entries each auth_asym_id “chain” is split into one or more label_asym_id subchains. The polymer (residues before the TER record in the PDB format) goes into one subchain; all the other (non-polymer) residues are put into single-residue subchains; except the waters, which are all put into one subchain. Currently, wwPDB treats non-linear polymers (such as sugars) as non-polymers.
Note
The implementation of mmCIF reading is partial, particularly for metadata processing, and is limited in scope to what is needed for Spruce TK. This means that converting to a PDB file will result in loss of header data.
It is recommended to always use the author-defined names, for consistency with the PDB format and with the literature.