class OEMCSFragDatabase
This class represents the OEMCSFragDatabase class that performs a fragmentation indexing on an input set of structures to allow an MCS similarity search for similar compounds with common cores.
See also
OEMCSFragDatabase()
Default constructor that initializes the fragment database with the default options defined by OEMCSFragDatabaseOptions.
OEMCSFragDatabase(const OEMCSFragDatabaseOptions &options)
Constructor that initializes the fragment database with the options defined by the OEMCSFragDatabaseOptions argument.
int AddConstMol(const OEChem::OEMolBase &inmol, int recordID =(-1) )
Adds the molecule to the index and returns the 0-based recordID if the structure was successfully added, or a negative index if the structure was unable to be indexed. Negative values can be used to retrieve status information for indexing failures. The optional user-defined recordID is provided to allow indexed fragments to be referenced to externally maintained data structures. If the provided recordID is less than zero, an autogenerated index is returned which is an index greater than all ids seen so far.
int AddMol(OEChem::OEMolBase &inmol, int recordID=(-1))
A high performance version of OEMCSFragDatabase.AddConstMol that eliminates molecule copy activities and modifies the passed molecule directly. The state of the returned structure after indexing activities is undefined and should generally be discarded or reinstantiated from the original source for any subsequent activities.
void ClearIndexedMols()
This method discards the current index contents and frees internal memory in preparation for subsequent reindexing.
OESystem::OEIterBase<unsigned int>*
CoreToMolecules(const std::string &core) const
Given a fragmentation core, return the molecule ids that have this core from the index.
unsigned int CoreToMoleculeCount(const std::string &core) const
Given a fragmentation core, return a count of the number of molecules that have this core in the index.
unsigned int GetMaxCoreMolecule(const char delim='.') const
Return the most common core fragment(s) recorded in the index. If there are multiple cores with the same occurrence count, the specified delim character is used to delimit the strings.
unsigned int GetMaxCoreMoleculeCount() const ;
Return the maximum core occurrence count recorded in the index.
unsigned int GetMaxMolIdx() const
Returns the highest molecule id currently present in the index.
const OEMCSFragDatabaseOptions &GetOptions() const
Returns the currently active options for the instance. Options cannot be changed, and require the desired options to be provided when the index is instantiated.
OESystem::OEIterBase<OEMedChem::OEMCSMolSimScore> *
GetScores(const OEChem::OEMolBase &query,
unsigned int bgn = 0,
unsigned int end = 0,
unsigned int scorecounts = OEMCSScoreType::Default,
const OEMCSSimFuncBase &scorefunc = OEMCSTanimotoSim()) const
Performs similarity calculations between a query molecule and the fragment cores stored in the OEMCSFragDatabase object. It returns an iterator over the calculated similarity scores (OEMCSMolSimScore). Each OEMCSMolSimScore object holds a similarity score, an index of database molecule and the fragment core in common between the hit and the query.
Note
OESystem::OEIterBase<OEMedChem::OEMCSMolSimScore> *
GetSortedScores(const OEChem::OEMolBase &query,
unsigned int limit = 0,
unsigned int bgn = 0,
unsigned int end = 0,
bool descending = true,
unsigned int scorecounts = OEMCSScoreType::Default,
const OEMCSSimFuncBase &scorefunc = OEMCSTanimotoSim()) const
Performs similarity calculations between a query molecule and the fragment cores stored in the OEMCSFragDatabase object. It returns an iterator over the calculated similarity scores (OEMCSMolSimScore) in sorted order. Each OEMCSMolSimScore object holds a similarity score an index of database molecule and the fragment core in common between the hit and query.
Note
bool IsIndexed(unsigned int recordID) const
Returns whether this record id is in the index.
OESystem::OEIterBase<const std::string>*
MoleculeToCores(const OEChem::OEMolBase &mol,
bool permuteFragments=true)
Given a molecule, return the fragmentation cores using either the provided fragmentation options, or using the fragmentation options from the OEMCSFragDatabase instance. If the permuteFragments argument is true, all combinations of the generated fragmentation cores are generated, otherwise a unique set of multi-fragment cores is returned representing all combinations of bond fragmentations between the min and max cut limits.
Shown below are versions for two type of examples - one that uses an OEMCSFragDatabase instance so the database index options are used to control the fragmentation behavior and the other uses the free function and custom options.
# create an MCS fragment database with defaults
fragdb = oemedchem.OEMCSFragDatabase()
# use the options from the frag database to fragment an arbitrary input molecule
print('MoleculeToCores using default fragment database options:')
sortedcores = sorted([c for c in fragdb.MoleculeToCores(mol)])
for corenum, core in enumerate(sortedcores):
print('{}: {}'.format(corenum, core))
# set the MCS fragment database options from the command-line arguments
fragopts = oemedchem.OEMCSFragDatabaseOptions()
if not oemedchem.OEConfigureMCSFragDatabaseOptions(itf):
oechem.OEThrow.Fatal("Error configuring options")
if not oemedchem.OESetupMCSFragDatabaseOptions(fragopts, itf):
oechem.OEThrow.Fatal("Error setting options")
# use the custom options to fragment an arbitrary input molecule
print('MoleculeToCores using command-line options:')
sortedcores = sorted([c for c in oemedchem.OEMoleculeToCores(mol, fragopts)])
for corenum, core in enumerate(sortedcores):
print('{}: {}'.format(corenum, core))
See also
unsigned int NumFragments() const
Returns the total number of fragment cores present in the index.
unsigned int NumMols() const
Returns the total number of indexed molecules present in the index.
int ProcessMol(OEChem::OEMolBase &inmol)
Since the fragmentation engine used internally to generate the index modifies the input structures (eg discarding all but the largest fragment of the input), this method is provided to force the application of the indexing modifications on the input structure. This is generally useful for depiction or reporting of the returned similarity results.