OECIFData
struct OECIFData
This class represents OECIFData that holds CIF header data and acts as an interface to
interact with this header data. This object reflects what is contained in the CIF data_ section.
The goal for the OECIFData class is to allow a user to freely interact with CIF data and be confident the data will be written in proper format that can be easily be read again. This object allows structural data stored on the molecule to be synchronized and updated with its CIF header data to reflect changes in the molecule. No change in OECIFData will affect the molecule that stores it.
The overall structure of OECIFData reflects CIF formatting. The OECIFData object stores all the data in a
data_ block. Values from each CIF category are stored on the OECIFCategory object. Thus, OECIFData is a
collection of OECIFCategory objects mapped to the category name.
Access to the data using OECIFData utilizes the CIF category and attribute names. These functions allow a user to get, set or add any value in the CIF header.
Note
Category names must begin with an underscore and end with a period. Example: "_entity."
The CIF File Format section explains the CIF data structure in more detail.
See also
OEUpdateMMCIFDatafunctionOECIFCategoryclassOECIFOptionsclass
Constructors
OECIFData()
OECIFData(const std::string &cifHeaderString)
OECIFData(const OEMolBase &mol)
Create an OECIFData object from a header data string, or header data stored on a mol accessed using OEGetMMCIFData.
AddCategory
bool AddCategory(const OECIFCategory &category)
bool AddCategory(const std::string &categoryName, const std::vector<std::string> &attributes, const std::vector<std::string> &newValues)
bool AddCategory(const std::string &categoryName, const std::vector<std::string> &attributes)
bool AddCategory(const std::string &categoryName, const std::string &attribute, const std::vector<std::string> &newValues)
bool AddCategory(const std::string &categoryName, const std::string &attribute, const std::string &newValue)
bool AddCategory(const std::string &categoryName)
Adds a new category to the OECIFData object.
AddRow
bool AddRow(const std::string &categoryName, const std::vector<std::string> &newValues)
bool AddRow(const std::string &categoryName, const std::string init="?")
Adds a row of values to the indicated CIF item prefix. If no values are input, by default, it will add a question mark “?” to every attribute in the category.
AddData
bool AddData(const std::string &categoryName, const std::vector<std::string> &attribute, const std::vector<std::string> &newValues)
bool AddData(const std::string &categoryName, const std::vector<std::string> &attribute, const std::string &newValue = "?")
Adds a new attribute and a column of its values to the indicated CIF item tag. The number of new data input must match the number of rows currently in the OECIFCategory it is being added to.
ChemCompToMol
bool ChemCompToMol(OEMolBase& cifMol, const std::string resName) const
Creates a molecule from CIF _chem_comp data in the OECIFData object. The following CIF categories must be present for proper function:
_chem_comp_atom. defines the atom name and element
_chem_comp_bond. defines the bonding network
The input cifMol molecule is cleared before parsing, so the resulting molecule object only contains atoms and bonds from the _chem_comp header data.
DeleteCategory
bool DeleteCategory(const std::string &categoryName)
Deletes a category based on the category name.
DeleteRow
bool DeleteRow(const std::string &categoryName, unsigned int row)
Deletes a category’s indicated row values based on the category name.
DeleteAttribute
bool DeleteTag(const std::string &categoryName, const std::vector<std::string> &attribute)
Deletes a category’s attribute and associated values.
GenerateHeader
std::string GenerateHeader() const
Using all the categories saved on this object, this will output the CIF header.
GeneratePolySeqScheme
bool GeneratePolySeqScheme(OEMolBase &mol, const unsigned int maxMismatch=10)
Will generate the _pdbx_poly_seq_scheme header section using header and molecule data. The following categories need to be present and properly populated for this function:
_struct_asym.
_entity.
_entity_poly_seq.
The molecule object must contain at least one residue. Any existing _pdbx_poly_seq_scheme data will be cleared before generation.
GetAttributes
std::vector<std::string> GetAttributes(const std::string &categoryName) const
Gets a list of all the attributes in this object.
GetAttributeValue
std::string GetAttributeValue(const std::string &categoryName, const std::string &attribute, bool raw=false) const
std::string GetAttributeValue(const std::string &categoryName, const std::string &attribute, unsigned int row, bool raw=false) const
Gets the saved value on the indicated category’s attribute. If multiple values exist for a given attribute, then a row number must be identified.
GetAttributeValues
std::vector<std::string> GetAttributeValues(const std::string &categoryName, const std::string &attribute, bool raw=false) const
Gets all the saved values on the indicated category’s attribute.
GetAttributeIndex
int GetAttributeIndex(const std::string &categoryName, const std::string &attribute) const
Returns the attribute’s base-0 index. For any given row of data associated with the category, this index correlates with the attribute’s value in that row. For example, if GetAttributeIndex(“entity.”, “type”)=1, for every row generated by GetRows(“_entity.”), the _category.type data would be the [1] value in that row.
GetCategory
OECIFCategory &GetCategory(const std::string &categoryName)
Gets the OECIFCategory object by its category name.
GetCategoryNames
std::vector<std::string> GetCategoryNames() const
Returns all the CIF category names present in this object.
GetDataItems
std::vector<std::string> GetDataItems(const std::string &categoryName) const
Returns a list of all the CIF data items in this object.
GetName
const std::string &GetName() const
Returns the name of the header data. Value is defined using the molecule’s title.
GetNumCategories
unsigned int GetNumCategories() const
Number of CIF categories saved in this object.
GetNumCategoryRows
unsigned int GetNumCategoryRows(const std::string &categoryName) const
Number of CIF rows in a category.
GetNumCategoryAttributes
unsigned int GetNumCategoryAttributes(const std::string &categoryName) const
Number of CIF attributes in a category.
GetNumericAttributeValue
GetFloatAttributeValue(String categoryName, String attribute, Float val)
GetIntAttributeValue(String categoryName, String attribute, Integer val)
GetFloatAttributeValue(String categoryName, String attribute, Integer row, Float val)
GetIntAttributeValue(String categoryName, String attribute, Integer row, Integer val)
Gets the saved value on the indicated category’s attribute and converts to a numeric value. If multiple values exist for a given item attribute, then a row number must be identified. If a value cannot be easily converted to a numeric value, it will default to 0.
GetNumericAttributeValues
GetFloatAttributeValues(String categoryName, String attribute,Vector<Float> vals)
GetIntAttributeValues(String categoryName, String attribute, Vector<Integer> vals)
Gets all the saved values on the indicated category’s attribute and converts to a numeric value. If a value cannot be easily converted to a numeric value, it will default to 0.
GetRow
std::vector<std::string> GetRow(const std::string &categoryName, unsigned int row, bool raw=false) const
Returns a specific row from the indicated category. Rows values are base-0. Raw values are formatted as they will appear in the header. This is most noticeable with values that contain space characters as CIF formatting requires single or double quotes around such a string value.
GetRows
std::vector<std::vector<std::string>> GetRows(const std::string &categoryName, int beginRow, int endRow, bool raw=false) const
std::vector<std::vector<std::string>> GetRows(const std::string &categoryName, bool raw=false) const
Returns rows from the indicated category. Rows values are base-0. Raw values are formatted as they will appear in the header. This is most noticeable with values that contain space characters as CIF formatting requires single or double quotes around such a string value.
GetRowIndices
std::vector<unsigned int> GetRowIndices(const std::string &categoryName, const std::string &attribute, const std::string &matchValue) const
Returns the base-0 row indices whose values string-match with input match value.
HasAttribute
bool HasAttribute(const std::string &categoryName, const std::string &attribute) const
Returns if an attribute is present in this object.
HasCategoryName
bool HasCategoryName(const std::string &categoryName) const
Returns if a category name is present in this object.
SetCategory
bool SetCategory(const OECIFCategory &category)
bool SetCategory(const std::string &categoryName, const std::string &attribute, const std::string &val)
bool SetCategory(const std::string &categoryName, const std::vector<std::string> &attributes, const std::vector<std::string> &vals)
Sets OECIFCategory information in this object.
SetData
bool SetData(const std::string &categoryName, const std::string &attribute, const std::string &nval)
bool SetData(const std::string &categoryName, const std::string &attribute, unsigned int row, const std::string &nval, const std::string init="?")
bool SetData(const std::string &categoryName, const std::string &attribute, const std::vector<std::string> &nvals)
bool SetData(const std::string &categoryName, unsigned int row, const std::vector<std::string> &nvals, const std::string init="?")
Sets attribute and row data on the indicated category.
SetMMCIFChemCompData
bool SetMMCIFChemCompData(const OEMolBase &mol, const OEMolBase &chemCompMol, const bool strict=false)
Updates _chem_comp sections in the MMCIF header metadata with information from the input chemCompMol molecule. First, a single-residue chemCompMol is checked to name match with a residue in the molecule. The chemCompMol atom and bond data is then used to replace the _chem_comp header data for a similarly named residue in the mol. If strict is used, an additional check for atom name matching is made between the matched residue in the molecule and the chemCompMol.
The following categories need to be present and properly populated in the input molecules header for this function. If any attributes are missing, running OEUpdateMMCIFData with ChemComp perception should populate any missing attributes:
_chem_comp.
_chem_comp_atom.
_chem_comp_bond.
If the molecule already exists, it will replace all chemical component reference of that molecule with the new molecule. By default, residue atom name matching is enforced. If strict is turned off, an OESubSearch is used to identify structural similarity with the new molecule. If a match is found, the matching structure atom names will be used in the _chem_comp atom naming scheme.
Update
bool Update(OEMolBase &mol, const OECIFOptions &opts)
Coordinates the molecule’s structural and residue data with related information found in the header. Where there are disagreements, the molecule’s data overrides the header data. Because updating can affect many parts of the header, to control what parts are and are not updated, the OECIFOptions class indicates which parts of the header will be perceived during the update.
This function can be used to convert PDB header data into CIF header data. Create an OECIFData object using the PDB molecule and run Update(). Only perceivable fields in the molecule’s structure data will be populated.
ValidateMMCIFHeader
bool ValidateMMCIFHeader(bool allowErrors=false, bool strict=false)
Validates self-consistency within a mmCIF header. Many items will reference each other and this function runs checks to see if some of these are consistent with each other. Item entries that are checked include:
_entity.
_entity_poly.
_entity_poly_seq.
_chem_comp.
_struct_asym.
The strict flag will explore more parts of the header and enforce a wider range of requirements:
Any ‘entry_id’ value reference must match an already defined ‘_entry.id’
Any ‘entity_id’ value reference must match an already defined ‘_entity.id’
An expression system has been identified (_entity_src_gen., _entity_src_nat., _entity_src_syn.)
_chem_comp_atom.
_chem_comp_bond.
_struct_ref.
_struct_ref_seq.
ValidateMMCIFMol
bool ValidateMMCIFMol(OEMolBase &mol, bool allowErrors=false, bool strict=false)
Validates self-consistency between the header and mol reference. Item entries that are checked against the input molecule include:
_entity.
_chem_comp.
_struct_asym.
The strict flag will explore more parts of the header and enforce a wider range of requirements against the input molecule:
strict _entity. header validation
_pdbx_poly_seq_scheme.