OECIFData

struct OECIFData

This class represents OECIFData that holds CIF header data and acts as an interface to interact with this header data. This object reflects what is contained in the CIF data_ section.

The goal for the OECIFData class is to allow a user to freely interact with CIF data and be confident the data will be written in proper format that can be easily be read again. This object allows structural data stored on the molecule to be synchronized and updated with its CIF header data to reflect changes in the molecule. No change in OECIFData will affect the molecule that stores it.

The overall structure of OECIFData reflects CIF formatting. The OECIFData object stores all the data in a data_ block. Values from each CIF category are stored on the OECIFCategory object. Thus, OECIFData is a collection of OECIFCategory objects mapped to the category name.

Access to the data using OECIFData utilizes the CIF category and attribute names. These functions allow a user to get, set or add any value in the CIF header.

Note

Category names must begin with an underscore and end with a period. Example: "_entity."

The CIF File Format section explains the CIF data structure in more detail.

Constructors

OECIFData()
OECIFData(const std::string &cifHeaderString)
OECIFData(const OEMolBase &mol)

Create an OECIFData object from a header data string, or header data stored on a mol accessed using OEGetMMCIFData.

AddCategory

bool AddCategory(const OECIFCategory &category)
bool AddCategory(const std::string &categoryName, const std::vector<std::string> &attributes, const std::vector<std::string> &newValues)
bool AddCategory(const std::string &categoryName, const std::vector<std::string> &attributes)
bool AddCategory(const std::string &categoryName, const std::string &attribute, const std::vector<std::string> &newValues)
bool AddCategory(const std::string &categoryName, const std::string &attribute, const std::string &newValue)
bool AddCategory(const std::string &categoryName)

Adds a new category to the OECIFData object.

AddRow

bool AddRow(const std::string &categoryName, const std::vector<std::string> &newValues)
bool AddRow(const std::string &categoryName, const std::string init="?")

Adds a row of values to the indicated CIF item prefix. If no values are input, by default, it will add a question mark “?” to every attribute in the category.

AddData

bool AddData(const std::string &categoryName, const std::vector<std::string> &attribute, const std::vector<std::string> &newValues)
bool AddData(const std::string &categoryName, const std::vector<std::string> &attribute, const std::string &newValue = "?")

Adds a new attribute and a column of its values to the indicated CIF item tag. The number of new data input must match the number of rows currently in the OECIFCategory it is being added to.

ChemCompToMol

bool ChemCompToMol(OEMolBase& cifMol, const std::string resName) const

Creates a molecule from CIF _chem_comp data in the OECIFData object. The following CIF categories must be present for proper function:

  • _chem_comp_atom. defines the atom name and element

  • _chem_comp_bond. defines the bonding network

The input cifMol molecule is cleared before parsing, so the resulting molecule object only contains atoms and bonds from the _chem_comp header data.

DeleteCategory

bool DeleteCategory(const std::string &categoryName)

Deletes a category based on the category name.

DeleteRow

bool DeleteRow(const std::string &categoryName, unsigned int row)

Deletes a category’s indicated row values based on the category name.

DeleteAttribute

bool DeleteTag(const std::string &categoryName, const std::vector<std::string> &attribute)

Deletes a category’s attribute and associated values.

GenerateHeader

std::string GenerateHeader() const

Using all the categories saved on this object, this will output the CIF header.

GeneratePolySeqScheme

bool GeneratePolySeqScheme(OEMolBase &mol, const unsigned int maxMismatch=10)

Will generate the _pdbx_poly_seq_scheme header section using header and molecule data. The following categories need to be present and properly populated for this function:

  • _struct_asym.

  • _entity.

  • _entity_poly_seq.

The molecule object must contain at least one residue. Any existing _pdbx_poly_seq_scheme data will be cleared before generation.

GetAttributes

std::vector<std::string>  GetAttributes(const std::string &categoryName) const

Gets a list of all the attributes in this object.

GetAttributeValue

std::string GetAttributeValue(const std::string &categoryName, const std::string &attribute, bool raw=false) const
std::string GetAttributeValue(const std::string &categoryName, const std::string &attribute, unsigned int row, bool raw=false) const

Gets the saved value on the indicated category’s attribute. If multiple values exist for a given attribute, then a row number must be identified.

GetAttributeValues

std::vector<std::string> GetAttributeValues(const std::string &categoryName, const std::string &attribute, bool raw=false) const

Gets all the saved values on the indicated category’s attribute.

GetAttributeIndex

int GetAttributeIndex(const std::string &categoryName, const std::string &attribute) const

Returns the attribute’s base-0 index. For any given row of data associated with the category, this index correlates with the attribute’s value in that row. For example, if GetAttributeIndex(“entity.”, “type”)=1, for every row generated by GetRows(“_entity.”), the _category.type data would be the [1] value in that row.

GetCategory

OECIFCategory &GetCategory(const std::string &categoryName)

Gets the OECIFCategory object by its category name.

GetCategoryNames

std::vector<std::string> GetCategoryNames() const

Returns all the CIF category names present in this object.

GetDataItems

std::vector<std::string>  GetDataItems(const std::string &categoryName) const

Returns a list of all the CIF data items in this object.

GetName

const std::string &GetName() const

Returns the name of the header data. Value is defined using the molecule’s title.

GetNumCategories

unsigned int GetNumCategories() const

Number of CIF categories saved in this object.

GetNumCategoryRows

unsigned int GetNumCategoryRows(const std::string &categoryName) const

Number of CIF rows in a category.

GetNumCategoryAttributes

unsigned int GetNumCategoryAttributes(const std::string &categoryName) const

Number of CIF attributes in a category.

GetNumericAttributeValue

GetFloatAttributeValue(String categoryName, String attribute, Float val)
GetIntAttributeValue(String categoryName, String attribute, Integer val)
GetFloatAttributeValue(String categoryName, String attribute, Integer row, Float val)
GetIntAttributeValue(String categoryName, String attribute, Integer row, Integer val)

Gets the saved value on the indicated category’s attribute and converts to a numeric value. If multiple values exist for a given item attribute, then a row number must be identified. If a value cannot be easily converted to a numeric value, it will default to 0.

GetNumericAttributeValues

GetFloatAttributeValues(String categoryName, String attribute,Vector<Float> vals)
GetIntAttributeValues(String categoryName, String attribute, Vector<Integer> vals)

Gets all the saved values on the indicated category’s attribute and converts to a numeric value. If a value cannot be easily converted to a numeric value, it will default to 0.

GetRow

std::vector<std::string> GetRow(const std::string &categoryName, unsigned int row, bool raw=false) const

Returns a specific row from the indicated category. Rows values are base-0. Raw values are formatted as they will appear in the header. This is most noticeable with values that contain space characters as CIF formatting requires single or double quotes around such a string value.

GetRows

std::vector<std::vector<std::string>> GetRows(const std::string &categoryName, int beginRow, int endRow, bool raw=false) const
std::vector<std::vector<std::string>> GetRows(const std::string &categoryName, bool raw=false) const

Returns rows from the indicated category. Rows values are base-0. Raw values are formatted as they will appear in the header. This is most noticeable with values that contain space characters as CIF formatting requires single or double quotes around such a string value.

GetRowIndices

std::vector<unsigned int> GetRowIndices(const std::string &categoryName, const std::string &attribute, const std::string &matchValue) const

Returns the base-0 row indices whose values string-match with input match value.

HasAttribute

bool HasAttribute(const std::string &categoryName, const std::string &attribute) const

Returns if an attribute is present in this object.

HasCategoryName

bool HasCategoryName(const std::string &categoryName) const

Returns if a category name is present in this object.

SetCategory

bool SetCategory(const OECIFCategory &category)
bool SetCategory(const std::string &categoryName, const std::string &attribute, const std::string &val)
bool SetCategory(const std::string &categoryName, const std::vector<std::string> &attributes, const std::vector<std::string> &vals)

Sets OECIFCategory information in this object.

SetData

bool SetData(const std::string &categoryName, const std::string &attribute, const std::string &nval)
bool SetData(const std::string &categoryName, const std::string &attribute, unsigned int row, const std::string &nval, const std::string init="?")
bool SetData(const std::string &categoryName, const std::string &attribute, const std::vector<std::string> &nvals)
bool SetData(const std::string &categoryName, unsigned int row, const std::vector<std::string> &nvals, const std::string init="?")

Sets attribute and row data on the indicated category.

SetMMCIFChemCompData

bool SetMMCIFChemCompData(const OEMolBase &mol, const OEMolBase &chemCompMol, const bool strict=false)

Updates _chem_comp sections in the MMCIF header metadata with information from the input chemCompMol molecule. First, a single-residue chemCompMol is checked to name match with a residue in the molecule. The chemCompMol atom and bond data is then used to replace the _chem_comp header data for a similarly named residue in the mol. If strict is used, an additional check for atom name matching is made between the matched residue in the molecule and the chemCompMol.

The following categories need to be present and properly populated in the input molecules header for this function. If any attributes are missing, running OEUpdateMMCIFData with ChemComp perception should populate any missing attributes:

  • _chem_comp.

  • _chem_comp_atom.

  • _chem_comp_bond.

If the molecule already exists, it will replace all chemical component reference of that molecule with the new molecule. By default, residue atom name matching is enforced. If strict is turned off, an OESubSearch is used to identify structural similarity with the new molecule. If a match is found, the matching structure atom names will be used in the _chem_comp atom naming scheme.

Update

bool Update(OEMolBase &mol, const OECIFOptions &opts)

Coordinates the molecule’s structural and residue data with related information found in the header. Where there are disagreements, the molecule’s data overrides the header data. Because updating can affect many parts of the header, to control what parts are and are not updated, the OECIFOptions class indicates which parts of the header will be perceived during the update.

This function can be used to convert PDB header data into CIF header data. Create an OECIFData object using the PDB molecule and run Update(). Only perceivable fields in the molecule’s structure data will be populated.

ValidateMMCIFHeader

bool ValidateMMCIFHeader(bool allowErrors=false, bool strict=false)

Validates self-consistency within a mmCIF header. Many items will reference each other and this function runs checks to see if some of these are consistent with each other. Item entries that are checked include:

  • _entity.

  • _entity_poly.

  • _entity_poly_seq.

  • _chem_comp.

  • _struct_asym.

The strict flag will explore more parts of the header and enforce a wider range of requirements:

  • Any ‘entry_id’ value reference must match an already defined ‘_entry.id’

  • Any ‘entity_id’ value reference must match an already defined ‘_entity.id’

  • An expression system has been identified (_entity_src_gen., _entity_src_nat., _entity_src_syn.)

  • _chem_comp_atom.

  • _chem_comp_bond.

  • _struct_ref.

  • _struct_ref_seq.

ValidateMMCIFMol

bool ValidateMMCIFMol(OEMolBase &mol, bool allowErrors=false, bool strict=false)

Validates self-consistency between the header and mol reference. Item entries that are checked against the input molecule include:

  • _entity.

  • _chem_comp.

  • _struct_asym.

The strict flag will explore more parts of the header and enforce a wider range of requirements against the input molecule:

  • strict _entity. header validation

  • _pdbx_poly_seq_scheme.