OEMolDatabase

class OEMolDatabase : public OESystem::OEBase

This class provides an abstraction for fast read-only random access to any molecular file format OEChem supports reading. The Molecular Database Handling chapter provides a description of the fundamentals of using this class.

When opened or saved to an .oeb file, the generic data on the object itself will be preserved through the OEHeader record in the file. The following methods can be used to access the generic since they are inherited from OEBase:

operator=

GetData

IsDataType

operator+=

GetDataIter

SetBaseData

AddBaseData

GetDataType

SetBoolData

AddData

GetDoubleData

SetData

Clear

GetFloatData

SetDoubleData

CreateCopy

GetIntData

SetFloatData

DeleteData

GetStringData

SetIntData

GetBoolData

HasData

SetStringData

Constructors

OEMolDatabase()
OEMolDatabase(OEChem::oemolistream &ifs)
OEMolDatabase(const std::string &filename)
OEMolDatabase(OEPlatform::oeistream &istr, unsigned int format)
OEMolDatabase(const std::string &filename, unsigned int format)
OEMolDatabase(const std::string &filename, unsigned int format,
              unsigned int flavor)
OEMolDatabase(OEPlatform::oeistream &istr, unsigned int format,
              unsigned int flavor)

The default constructor will initialize the object to an empty state where most of the accessors methods will fail. It is expected that the user will later call OEMolDatabase::Open on the object.

The other constructors are convenience methods for opening the database on the given filename or stream with the specified format and flavor.

The database requires file storage to store the actual molecule data. Therefore, if given a oeistream, the stream’s data will be copied to a temporary file. The temporary file will be automatically deleted by the destructor of this class.

oemolistream may not need a temporary file if they were opened on an actual file.

Clear

void Clear()

Removes all generic by calling through to OEBase::Clear as well as resetting the database back to an un-initialized state, the same state as the default constructor.

CreateCopy

OESystem::OEBase *CreateCopy() const

Creates a copy of the database pointing to the same backing file store as the original. The copy is invalid if the original is destroyed.

Warning

It is not recommended to use this method. It is only implemented to adhere to the OEBase API.

Delete

bool Delete(unsigned int idx)
bool Delete(const std::vector<unsigned int> &indices)
bool Delete(const OESystem::OEUnaryPredicate<OEMolBase> &pred)
bool Delete(const OESystem::OEUnaryPredicate<OEMCMolBase> &pred)

Mark the given molecules as “deleted” in the database in constant time, O(1). Deleted molecules will not be written during a Save operation. Deleted molecules will return false when accessed from the OEMolDatabase::GetMolecule method.

GetDataType

const void *GetDataType() const

Return an opaque value representing the data type of this class.

GetFlavor

unsigned int GetFlavor() const

Returns 0 if uninitialized. Returns the flavor used to interpret the molecules in this database, pulled from the OEIFlavor namespace associated with the format returned from the OEMolDatabase::GetFormat method.

GetFormat

unsigned int GetFormat() const

Returns the format used to interpret the molecules in the file this database was opened on. The value returned is a member of the the OEFormat namespace.

GetIdx

unsigned int GetIdx(unsigned int row)

Returns the index for the nth row in the database. row should be less than OEMolDatabase::NumMols and the value returned will be less than OEMolDatabase::GetMaxMolIdx. By default, row numbers are the same as their index. Row numbers can be changed by deleting or re-ordering the database. If row numbers were altered before this call, this call is O(GetMaxMolIdx()). After the first call to GetIdx, subsequent calls are O(1), until another alteration is made to the database.

This is the reverse operation from OEMolDatabase::GetRow.

GetIdxs

OESystem::OEIterBase<unsigned int> *GetIdxs() const
bool GetIdxs(std::vector<unsigned int> &indices) const

Returns all the valid indices in the order that is currently considered the database order by this object.

GetMaxMolIdx

unsigned int GetMaxMolIdx() const

Returns one more than the maximum molecule index this database currently contains.

GetMolecule

bool GetMolecule(OEMolBase &mol, unsigned int idx) const
bool GetMolecule(OEMCMolBase &mol, unsigned int idx) const
bool GetMolecule(oemolostream &ostr, unsigned int idx) const

Read the molecule located at index idx into the molecule mol. If the file format is a multi-conformer .oeb file, only the first will be read if mol is an OEMolBase. If there is a need for handling multi-conformer molecules, the OEMCMolBase overload should be used instead.

Warning

The overload that takes an oemolostream is deprecated as of 2015.Jun and will be removed in a future version of OEChem TK, please use OEMolDatabase::WriteMolecule instead.

The overload that takes an oemolostream will write bytes for the molecule record at index idx to the stream ostr. No attempt will be made to interpret the data as a molecule and verify whether it is truly a parsable molecule. The molecule will be in whatever format the database was opened on.

GetMoleculeString

std::string GetMoleculeString(unsigned int idx) const

Return the chunk of data from the molecule file that corresponds to the molecule record at index idx.

GetRow

unsigned int GetRow(unsigned int idx)

Return where the database considers the molecule record idx to reside in the global order of the database. idx should be less than OEMolDatabase::GetMaxMolIdx and the value returned will be less than OEMolDatabase::NumMols. By default, row numbers are the same as their index. Row numbers can be changed by deleting or re-ordering the database. If row numbers were altered before this call, this call is O(GetMaxMolIdx()). After the first call to GetIdx, subsequent calls are O(1), until another alteration is made to the database.

This is the reverse operation from OEMolDatabase::GetIdx.

GetTitle

std::string GetTitle(unsigned int idx) const

Returns the title for the molecule at index idx in the database. This is the same value that is returned by OEMolBase::GetTitle, but without the need to parse the whole molecule record into an OEMolBase.

GetTitles

OESystem::OEIterBase<const std::string> *GetTitles() const

Returns all the titles for the database as would be returned by OEMolDatabase::GetTitle in database index order.

IsDataType

bool IsDataType(const void *) const

Returns whether the opaque data passed represents the same data type as this class.

IsDeleted

bool IsDeleted(unsigned int idx) const

Returns whether the database index idx has already been deleted by a previous call to OEMolDatabase::Delete.

NumMols

unsigned int NumMols() const

Returns the number of molecule records in the database.

Note

The number of valid molecules in the database might be less than the number returned by OEMolDatabase::NumMols. The reason is that the molecule file is not fully parsed when invoking the OEMolDatabase::NumMols function.

Open

bool Open(OEChem::oemolistream &ifs)
bool Open(const std::string &filename)
bool Open(const std::string &filename, unsigned int format)
bool Open(OEPlatform::oeistream &istr, unsigned int format)
bool Open(const std::string &filename, unsigned int format, unsigned int flavor)
bool Open(OEPlatform::oeistream &istr, unsigned int format, unsigned int flavor)

Initializes the database to the file name given. The database requires file storage to store the actual molecule data. Therefore, if given a oeistream, the stream’s data will be copied to a temporary file. The temporary file will be automatically deleted by the destructor of this class.

oemolistream may not need a temporary file if they were opened on an actual file.

This method is expensive, it may require scanning the whole file to determine the proper file offsets to the individual molecule records. To speed up the Open operation, .idx files can be generated by OECreateMolDatabaseIdx or OEMolDatabase::Save. If Open detects the presence of an .idx file parallel to the database file, the file offsets will be read directly from this file instead, possibly saving the need to stream the entire database from disk.

See also

Index Files section

Order

bool Order(const std::vector<unsigned int> &indices)

Re-orders the molecules in the database according to the order of the indices specified by indices. Subsequent calls to OEMolDatabase::GetIdx or OEMolDatabase::GetRow will return this order. Subsequent calls to OEMolDatabase::Save will output the database in this order.

Save

bool Save(const std::string &filename, unsigned int format) const
bool Save(OEPlatform::oeostream &ostr, unsigned int format) const
bool Save(const std::string &filename, unsigned int format,
          unsigned int flavor) const
bool Save(OEPlatform::oeostream &ostr, unsigned int format,
          unsigned int flavor) const
bool Save(OEChem::oemolostream &ofs,
          const OEMolDatabaseSaveOptions &opts=OEMolDatabaseSaveOptions()) const
bool Save(const std::string &filename,
          const OEMolDatabaseSaveOptions &opts=OEMolDatabaseSaveOptions()) const

Save the database to the file specified by filename in the format and flavor specified. The data may be written directly to the stream ostr as well.

This method is heavily optimized for the case whenever the output file format is the same as the input file format used to initialize the database. If the same file format is used, the molecule record’s bytes will be streamed directly, without an intermediate OEMolBase to do the conversion.

See also

Warning

Since OEMolDatabase depends on backing file storage. Therefore, if the contents of the file change after OEMolDatabase::Open completes, the behavior is undefined. It is very likely the file data itself will be corrupted. For this reason, OEMolDatabase::Save will fail to write to the same file name used to open the database object. This protects against the common case, but does not protect against the general case of file changing during access.

WriteMolecule

bool WriteMolecule(oemolostream &ostr, unsigned int idx) const

Write bytes for the molecule record at index idx to the stream ostr. No attempt will be made to interpret the data as a molecule and verify whether it is truly a parsable molecule. The molecule will be in whatever format the database was opened on. Returns false if the oemolostream is not the exact same file format as the OEMolDatabase or any input/output operation fails.