OEMolDatabase¶
class OEMolDatabase : public OESystem::OEBase
This class provides an abstraction for fast read-only random access to any molecular file format OEChem supports reading. The Molecular Database Handling chapter provides a description of the fundamentals of using this class.
When opened
or
saved
to an .oeb
file, the generic data on the object itself will be preserved
through the OEHeader record in the file. The
following methods can be used to access the generic since they are
inherited from OEBase:
Constructors¶
OEMolDatabase()
OEMolDatabase(OEChem::oemolistream &ifs)
OEMolDatabase(const std::string &filename)
OEMolDatabase(OEPlatform::oeistream &istr, unsigned int format)
OEMolDatabase(const std::string &filename, unsigned int format)
OEMolDatabase(const std::string &filename, unsigned int format,
unsigned int flavor)
OEMolDatabase(OEPlatform::oeistream &istr, unsigned int format,
unsigned int flavor)
The default constructor will initialize the object to an empty
state where most of the accessors methods will fail. It is
expected that the user will later call
OEMolDatabase.Open
on the object.
The other constructors are convenience methods for opening the
database on the given filename
or stream with the specified
format
and flavor
.
The database requires file storage to store the actual molecule data. Therefore, if given a oeistream, the stream’s data will be copied to a temporary file. The temporary file will be automatically deleted by the destructor of this class.
oemolistream may not need a temporary file if they were opened on an actual file.
Clear¶
void Clear()
Removes all generic by calling through to
OEBase.Clear
as well as resetting the
database back to an un-initialized state, the same state as the
default constructor.
CreateCopy¶
OESystem::OEBase *CreateCopy() const
Creates a copy of the database pointing to the same backing file store as the original. The copy is invalid if the original is destroyed.
Warning
It is not recommended to use this method. It is only implemented to adhere to the OEBase API.
Delete¶
bool Delete(unsigned int idx)
bool Delete(const std::vector<unsigned int> &indices)
bool Delete(const OESystem::OEUnaryPredicate<OEMolBase> &pred)
bool Delete(const OESystem::OEUnaryPredicate<OEMCMolBase> &pred)
Mark the given molecules as “deleted” in the database in
constant time, O(1)
. Deleted molecules will not be written
during a Save
operation. Deleted molecules will return false
when accessed
from the OEMolDatabase.GetMolecule
method.
GetDataType¶
const void *GetDataType() const
Return an opaque value representing the data type of this class.
GetFlavor¶
unsigned int GetFlavor() const
Returns 0
if uninitialized. Returns the flavor used to
interpret the molecules in this database, pulled from the
OEIFlavor
namespace associated with the
format returned from the
OEMolDatabase.GetFormat
method.
GetFormat¶
unsigned int GetFormat() const
Returns the format used to interpret the molecules in the file this database was opened on. The value returned is a member of the the OEFormat namespace.
GetIdx¶
unsigned int GetIdx(unsigned int row)
Returns the index for the nth row
in the
database. row
should be less than
OEMolDatabase.NumMols
and the value returned
will be less than
OEMolDatabase.GetMaxMolIdx
. By default, row
numbers are the same as their index. Row numbers can be changed
by deleting or re-ordering the database. If row numbers were
altered before this call, this call is
O(GetMaxMolIdx())
. After the first call to GetIdx
,
subsequent calls are O(1)
, until another alteration is made
to the database.
This is the reverse operation from
OEMolDatabase.GetRow
.
GetIdxs¶
OESystem::OEIterBase<unsigned int> *GetIdxs() const
bool GetIdxs(std::vector<unsigned int> &indices) const
Returns all the valid indices in the order that is currently considered the database order by this object.
GetMaxMolIdx¶
unsigned int GetMaxMolIdx() const
Returns one more than the maximum molecule index this database currently contains.
GetMolecule¶
bool GetMolecule(OEMolBase &mol, unsigned int idx) const
bool GetMolecule(OEMCMolBase &mol, unsigned int idx) const
bool GetMolecule(oemolostream &ostr, unsigned int idx) const
Read the molecule located at index idx
into the molecule
mol
. If the file format is a multi-conformer .oeb
file, only the first will be read if mol
is an
OEMolBase. If there is a need for handling
multi-conformer molecules, the OEMCMolBase
overload should be used instead.
Warning
The overload that takes an oemolostream is
deprecated as of 2015.Jun and will be removed in a future
version of OEChem TK, please use
OEMolDatabase.WriteMolecule
instead.
The overload that takes an oemolostream will
write bytes for the molecule record at index idx
to the
stream ostr
. No attempt will be made to interpret the data
as a molecule and verify whether it is truly a parsable
molecule. The molecule will be in whatever format the database
was opened on.
GetMoleculeString¶
std::string GetMoleculeString(unsigned int idx) const
Return the chunk of data from the molecule file that corresponds
to the molecule record at index idx
.
In Python the OEMolDatabase.GetMoleculeString
method returns bytes
.
GetOEGraphMols¶
GetOEGraphMols() -> <generator of OEGraphMol objects>
Returns a generator over all the non-deleted molecules in the
OEMolDatabase in the current order. By
default, the molecule database order is defined by the file,
but can be altered by
OEMolDatabase.Order
. Each molecule will be
an OEGraphMol object. If opened on a
multi-conformer .oeb
file, only the first conformer
will be returned for each molecule record.
Note
Unlike OEChem::oemolistream::GetOEGraphMols
, a new
molecule object will be returned upon each iteration, reducing
the need to create molecule copies when keeping the molecules
around.
GetOEMols¶
GetOEMols() -> <generator of OEMol objects>
Returns a generator over all the non-deleted molecules in the
OEMolDatabase in the current order. By
default, the molecule database order is defined by the file,
but can be altered by
OEMolDatabase.Order
. Each molecule will be
an OEMol object.
Note
Unlike oemolistream.GetOEMols
, a new
molecule object will be returned upon each iteration, reducing
the need to create molecule copies when keeping the molecules
around.
GetRow¶
unsigned int GetRow(unsigned int idx)
Return where the database considers the molecule record idx
to reside in the global order of the database. idx
should be
less than OEMolDatabase.GetMaxMolIdx
and the
value returned will be less than
OEMolDatabase.NumMols
. By default, row
numbers are the same as their index. Row numbers can be changed
by deleting or re-ordering the database. If row numbers were
altered before this call, this call is
O(GetMaxMolIdx())
. After the first call to GetIdx
,
subsequent calls are O(1)
, until another alteration is made
to the database.
This is the reverse operation from
OEMolDatabase.GetIdx
.
GetTitle¶
std::string GetTitle(unsigned int idx) const
Returns the title for the molecule at index idx
in the
database. This is the same value that is returned by
OEMolBase.GetTitle
, but without the need to
parse the whole molecule record into an OEMolBase.
GetTitles¶
OESystem::OEIterBase<const std::string> *GetTitles() const
Returns all the titles for the database as would be returned by
OEMolDatabase.GetTitle
in database index
order.
IsDataType¶
bool IsDataType(const void *) const
Returns whether the opaque data passed represents the same data type as this class.
IsDeleted¶
bool IsDeleted(unsigned int idx) const
Returns whether the database index idx
has already been
deleted by a previous call to
OEMolDatabase.Delete
.
NumMols¶
unsigned int NumMols() const
Returns the number of molecule records in the database.
Note
The number of valid molecules in the database might be less than
the number returned by OEMolDatabase.NumMols
.
The reason is that the molecule file is not fully parsed when invoking the
OEMolDatabase.NumMols
function.
Open¶
bool Open(OEChem::oemolistream &ifs)
bool Open(const std::string &filename)
bool Open(const std::string &filename, unsigned int format)
bool Open(OEPlatform::oeistream &istr, unsigned int format)
bool Open(const std::string &filename, unsigned int format, unsigned int flavor)
bool Open(OEPlatform::oeistream &istr, unsigned int format, unsigned int flavor)
Initializes the database to the file name given. The database requires file storage to store the actual molecule data. Therefore, if given a oeistream, the stream’s data will be copied to a temporary file. The temporary file will be automatically deleted by the destructor of this class.
oemolistream may not need a temporary file if they were opened on an actual file.
This method is expensive, it may require scanning the whole file
to determine the proper file offsets to the individual molecule
records. To speed up the Open
operation, .idx
files
can be generated by OECreateMolDatabaseIdx
or
OEMolDatabase.Save
. If Open
detects the
presence of an .idx
file parallel to the database file,
the file offsets will be read directly from this file instead,
possibly saving the need to stream the entire database from disk.
See also
Index Files section
Order¶
bool Order(const std::vector<unsigned int> &indices)
Re-orders the molecules in the database according to the order
of the indices specified by indices
. Subsequent calls to
OEMolDatabase.GetIdx
or
OEMolDatabase.GetRow
will return this
order. Subsequent calls to OEMolDatabase.Save
will output the database in this order.
Save¶
bool Save(const std::string &filename, unsigned int format) const
bool Save(OEPlatform::oeostream &ostr, unsigned int format) const
bool Save(const std::string &filename, unsigned int format,
unsigned int flavor) const
bool Save(OEPlatform::oeostream &ostr, unsigned int format,
unsigned int flavor) const
bool Save(OEChem::oemolostream &ofs,
const OEMolDatabaseSaveOptions &opts=OEMolDatabaseSaveOptions()) const
bool Save(const std::string &filename,
const OEMolDatabaseSaveOptions &opts=OEMolDatabaseSaveOptions()) const
Save the database to the file specified by filename
in the
format
and flavor
specified. The data may be written
directly to the stream ostr
as well.
This method is heavily optimized for the case whenever the output file format is the same as the input file format used to initialize the database. If the same file format is used, the molecule record’s bytes will be streamed directly, without an intermediate OEMolBase to do the conversion.
See also
OEMolDatabaseSaveOptions class for more options when saving a database to a file.
OEGetMolDatabaseIdxFileName
to retrieve the default filename used for the index file generated.
Warning
Since OEMolDatabase depends on backing file
storage. Therefore, if the contents of the file change after
OEMolDatabase.Open
completes, the
behavior is undefined. It is very likely the file data itself
will be corrupted. For this reason,
OEMolDatabase.Save
will fail to write to
the same file name used to open the database object. This
protects against the common case, but does not protect
against the general case of file changing during access.
WriteMolecule¶
bool WriteMolecule(oemolostream &ostr, unsigned int idx) const
Write bytes for the molecule record at index idx
to the
stream ostr
. No attempt will be made to interpret the data
as a molecule and verify whether it is truly a parsable
molecule. The molecule will be in whatever format the database
was opened on. Returns false
if the
oemolostream is not the exact same file format
as the OEMolDatabase or any input/output
operation fails.