OEMCSFragDatabase¶
class OEMCSFragDatabase
This class represents the OEMCSFragDatabase class that performs a fragmentation indexing on an input set of structures to allow an MCS similarity search for similar compounds with common cores.
See also
OEMCSFragDatabaseOptions class
Constructors¶
OEMCSFragDatabase()
Default constructor that initializes the fragment database with the default options defined by OEMCSFragDatabaseOptions.
OEMCSFragDatabase(const OEMCSFragDatabaseOptions &options)
Constructor that initializes the fragment database with the options defined by the OEMCSFragDatabaseOptions argument.
AddConstMol¶
int AddConstMol(const OEChem::OEMolBase &inmol, int recordID =(-1) )
Adds the molecule to the index and returns the 0-based recordID if the structure was successfully added, or a negative index if the structure was unable to be indexed. Negative values can be used to retrieve status information for indexing failures. The optional user-defined recordID is provided to allow indexed fragments to be referenced to externally maintained data structures. If the provided recordID is less than zero, an autogenerated index is returned which is an index greater than all ids seen so far.
AddMol¶
int AddMol(OEChem::OEMolBase &inmol, int recordID=(-1))
A high performance version of OEMCSFragDatabase.AddConstMol
that eliminates molecule copy activities and modifies the passed molecule directly.
The state of the returned structure after indexing activities is undefined and
should generally be discarded or reinstantiated from the original source for
any subsequent activities.
ClearIndexedMols¶
void ClearIndexedMols()
This method discards the current index contents and frees internal memory in preparation for subsequent reindexing.
CoreToMolecules¶
OESystem::OEIterBase<unsigned int>*
CoreToMolecules(const std::string &core) const
Given a fragmentation core, return the molecule ids that have this core from the index.
CoreToMoleculeCount¶
unsigned int CoreToMoleculeCount(const std::string &core) const
Given a fragmentation core, return a count of the number of molecules that have this core in the index.
GetMaxCoreMolecule¶
unsigned int GetMaxCoreMolecule(const char delim='.') const
Return the most common core fragment(s) recorded in the index. If there are multiple cores with the same occurrence count,
the specified delim
character is used to delimit the strings.
GetMaxCoreMoleculeCount¶
unsigned int GetMaxCoreMoleculeCount() const ;
Return the maximum core occurrence count recorded in the index.
GetMaxMolIdx¶
unsigned int GetMaxMolIdx() const
Returns the highest molecule id currently present in the index.
GetOptions¶
const OEMCSFragDatabaseOptions &GetOptions() const
Returns the currently active options for the instance. Options cannot be changed, and require the desired options to be provided when the index is instantiated.
GetScores¶
OESystem::OEIterBase<OEMedChem::OEMCSMolSimScore> *
GetScores(const OEChem::OEMolBase &query,
unsigned int bgn = 0,
unsigned int end = 0,
unsigned int scorecounts = OEMCSScoreType::Default,
const OEMCSSimFuncBase &scorefunc = OEMCSTanimotoSim()) const
Performs similarity calculations between a query molecule and the fragment cores stored in the OEMCSFragDatabase object. It returns an iterator over the calculated similarity scores (OEMCSMolSimScore). Each OEMCSMolSimScore object holds a similarity score, an index of database molecule and the fragment core in common between the hit and the query.
- bgn, end
The
bgn
andend
arguments define the segment of the database on which the similarity calculation will take place. If both of these parameters are omitted (or set to zero), then the similarity calculation is performed on the entire fragment database.- scorecounts
The value that defines the type of similarity scores (atom or bond) returned by the
OEMCSFragDatabase.GetScores
method. The default type isOEMCSScoreType_AtomCount
.- scorefunc
A class that computes the similarity score. By default this is OEMCSTanimotoSim but an implementation of OEMCSTverskySim is also provided. By providing a class derived from OEMCSSimFuncBase user-defined similarity measures other than the provided versions can be generated.
Note
By default, the
OEMCSFragDatabase.GetScores
method calculatesTanimoto
atom similarity scores.
GetSortedScores¶
OESystem::OEIterBase<OEMedChem::OEMCSMolSimScore> *
GetSortedScores(const OEChem::OEMolBase &query,
unsigned int limit = 0,
unsigned int bgn = 0,
unsigned int end = 0,
bool descending = true,
unsigned int scorecounts = OEMCSScoreType::Default,
const OEMCSSimFuncBase &scorefunc = OEMCSTanimotoSim()) const
Performs similarity calculations between a query molecule and the fragment cores stored in the OEMCSFragDatabase object. It returns an iterator over the calculated similarity scores (OEMCSMolSimScore) in sorted order. Each OEMCSMolSimScore object holds a similarity score an index of database molecule and the fragment core in common between the hit and query.
- limit
The value that defines the number of similarity scores returned by the
OEMCSFragDatabase.GetSortedScores
method. If it is omitted (or set to zero) then all of the similarity scores are returned.- bgn, end
The
bgn
andend
arguments define the segment of the database on which the similarity calculation will take place. If both of these parameters are omitted (or set to zero), then the similarity calculation is performed on the entire fragment database.- descending
A boolean value that indicates the direction of the sort values where
true
requests a descending sort andfalse
requests ascending.- scorecounts
The value that defines the type of similarity scores (atom or bond) returned by the
OEMCSFragDatabase.GetSortedScores
method. The default type isOEMCSScoreType_AtomCount
.- scorefunc
A class that computes the similarity score. By default this is OEMCSTanimotoSim but an implementation of OEMCSTverskySim is also provided. By providing a class derived from OEMCSSimFuncBase user-defined similarity measures other than the provided versions can be used.
Note
By default, the
OEMCSFragDatabase.GetSortedScores
method calculatesTanimoto
atom similarity scores and returns the highest scores first.
IsIndexed¶
bool IsIndexed(unsigned int recordID) const
Returns whether this record id is in the index.
MoleculeToCores¶
OESystem::OEIterBase<const std::string>*
MoleculeToCores(const OEChem::OEMolBase &mol,
bool permuteFragments=true)
Given a molecule, return the fragmentation cores using either the
provided fragmentation options, or using the fragmentation options from
the OEMCSFragDatabase instance. If the
permuteFragments
argument is true
, all combinations
of the generated fragmentation cores are generated, otherwise a unique set
of multi-fragment cores is returned representing all combinations
of bond fragmentations between the min and max cut limits.
Shown below are versions for two type of examples - one that uses an OEMCSFragDatabase instance so the database index options are used to control the fragmentation behavior and the other uses the free function and custom options.
# create an MCS fragment database with defaults
fragdb = oemedchem.OEMCSFragDatabase()
# use the options from the frag database to fragment an arbitrary input molecule
print('MoleculeToCores using default fragment database options:')
sortedcores = sorted([c for c in fragdb.MoleculeToCores(mol)])
for corenum, core in enumerate(sortedcores):
print('{}: {}'.format(corenum, core))
# set the MCS fragment database options from the command-line arguments
fragopts = oemedchem.OEMCSFragDatabaseOptions()
if not oemedchem.OEConfigureMCSFragDatabaseOptions(itf):
oechem.OEThrow.Fatal("Error configuring options")
if not oemedchem.OESetupMCSFragDatabaseOptions(fragopts, itf):
oechem.OEThrow.Fatal("Error setting options")
# use the custom options to fragment an arbitrary input molecule
print('MoleculeToCores using command-line options:')
sortedcores = sorted([c for c in oemedchem.OEMoleculeToCores(mol, fragopts)])
for corenum, core in enumerate(sortedcores):
print('{}: {}'.format(corenum, core))
See also
OEMCSFragDatabaseOptions class
OEMoleculeToCores
function
NumFragments¶
unsigned int NumFragments() const
Returns the total number of fragment cores present in the index.
NumMols¶
unsigned int NumMols() const
Returns the total number of indexed molecules present in the index.
ProcessMol¶
int ProcessMol(OEChem::OEMolBase &inmol)
Since the fragmentation engine used internally to generate the index modifies the input structures (eg discarding all but the largest fragment of the input), this method is provided to force the application of the indexing modifications on the input structure. This is generally useful for depiction or reporting of the returned similarity results.