OEMCSFragDatabase

class OEMCSFragDatabase

This class represents the OEMCSFragDatabase class that performs a fragmentation indexing on an input set of structures to allow an MCS similarity search for similar compounds with common cores.

See also

Constructors

OEMCSFragDatabase()

Default constructor that initializes the fragment database with the default options defined by OEMCSFragDatabaseOptions.

OEMCSFragDatabase(const OEMCSFragDatabaseOptions &options)

Constructor that initializes the fragment database with the options defined by the OEMCSFragDatabaseOptions argument.

AddConstMol

int AddConstMol(const OEChem::OEMolBase &inmol, int recordID =(-1) )

Adds the molecule to the index and returns the 0-based recordID if the structure was successfully added, or a negative index if the structure was unable to be indexed. Negative values can be used to retrieve status information for indexing failures. The optional user-defined recordID is provided to allow indexed fragments to be referenced to externally maintained data structures. If the provided recordID is less than zero, an autogenerated index is returned which is an index greater than all ids seen so far.

AddMol

int AddMol(OEChem::OEMolBase &inmol, int recordID=(-1))

A high performance version of OEMCSFragDatabase.AddConstMol that eliminates molecule copy activities and modifies the passed molecule directly. The state of the returned structure after indexing activities is undefined and should generally be discarded or reinstantiated from the original source for any subsequent activities.

ClearIndexedMols

void ClearIndexedMols()

This method discards the current index contents and frees internal memory in preparation for subsequent reindexing.

CoreToMolecules

OESystem::OEIterBase<unsigned int>*
    CoreToMolecules(const std::string &core) const

Given a fragmentation core, return the molecule ids that have this core from the index.

CoreToMoleculeCount

unsigned int CoreToMoleculeCount(const std::string &core) const

Given a fragmentation core, return a count of the number of molecules that have this core in the index.

GetMaxCoreMolecule

unsigned int GetMaxCoreMolecule(const char delim='.') const

Return the most common core fragment(s) recorded in the index. If there are multiple cores with the same occurrence count, the specified delim character is used to delimit the strings.

GetMaxCoreMoleculeCount

unsigned int GetMaxCoreMoleculeCount() const ;

Return the maximum core occurrence count recorded in the index.

GetMaxMolIdx

unsigned int GetMaxMolIdx() const

Returns the highest molecule id currently present in the index.

GetOptions

const OEMCSFragDatabaseOptions &GetOptions() const

Returns the currently active options for the instance. Options cannot be changed, and require the desired options to be provided when the index is instantiated.

GetScores

OESystem::OEIterBase<OEMedChem::OEMCSMolSimScore> *
          GetScores(const OEChem::OEMolBase &query,
                    unsigned int bgn = 0,
                    unsigned int end = 0,
                    unsigned int scorecounts = OEMCSScoreType::Default,
                    const OEMCSSimFuncBase &scorefunc = OEMCSTanimotoSim()) const

Performs similarity calculations between a query molecule and the fragment cores stored in the OEMCSFragDatabase object. It returns an iterator over the calculated similarity scores (OEMCSMolSimScore). Each OEMCSMolSimScore object holds a similarity score, an index of database molecule and the fragment core in common between the hit and the query.

bgn, end

The bgn and end arguments define the segment of the database on which the similarity calculation will take place. If both of these parameters are omitted (or set to zero), then the similarity calculation is performed on the entire fragment database.

scorecounts

The value that defines the type of similarity scores (atom or bond) returned by the OEMCSFragDatabase.GetScores method. The default type is OEMCSScoreType_AtomCount.

scorefunc

A class that computes the similarity score. By default this is OEMCSTanimotoSim but an implementation of OEMCSTverskySim is also provided. By providing a class derived from OEMCSSimFuncBase user-defined similarity measures other than the provided versions can be generated.

Note

GetSortedScores

OESystem::OEIterBase<OEMedChem::OEMCSMolSimScore> *
          GetSortedScores(const OEChem::OEMolBase &query,
                          unsigned int limit = 0,
                          unsigned int bgn = 0,
                          unsigned int end = 0,
                          bool descending = true,
                          unsigned int scorecounts = OEMCSScoreType::Default,
                          const OEMCSSimFuncBase &scorefunc = OEMCSTanimotoSim()) const

Performs similarity calculations between a query molecule and the fragment cores stored in the OEMCSFragDatabase object. It returns an iterator over the calculated similarity scores (OEMCSMolSimScore) in sorted order. Each OEMCSMolSimScore object holds a similarity score an index of database molecule and the fragment core in common between the hit and query.

limit

The value that defines the number of similarity scores returned by the OEMCSFragDatabase.GetSortedScores method. If it is omitted (or set to zero) then all of the similarity scores are returned.

bgn, end

The bgn and end arguments define the segment of the database on which the similarity calculation will take place. If both of these parameters are omitted (or set to zero), then the similarity calculation is performed on the entire fragment database.

descending

A boolean value that indicates the direction of the sort values where true requests a descending sort and false requests ascending.

scorecounts

The value that defines the type of similarity scores (atom or bond) returned by the OEMCSFragDatabase.GetSortedScores method. The default type is OEMCSScoreType_AtomCount.

scorefunc

A class that computes the similarity score. By default this is OEMCSTanimotoSim but an implementation of OEMCSTverskySim is also provided. By providing a class derived from OEMCSSimFuncBase user-defined similarity measures other than the provided versions can be used.

Note

IsIndexed

bool IsIndexed(unsigned int recordID) const

Returns whether this record id is in the index.

MoleculeToCores

OESystem::OEIterBase<const std::string>*
     MoleculeToCores(const OEChem::OEMolBase &mol,
                          bool permuteFragments=true)

Given a molecule, return the fragmentation cores using either the provided fragmentation options, or using the fragmentation options from the OEMCSFragDatabase instance. If the permuteFragments argument is true, all combinations of the generated fragmentation cores are generated, otherwise a unique set of multi-fragment cores is returned representing all combinations of bond fragmentations between the min and max cut limits.

Shown below are versions for two type of examples - one that uses an OEMCSFragDatabase instance so the database index options are used to control the fragmentation behavior and the other uses the free function and custom options.

    # create an MCS fragment database with defaults
    fragdb = oemedchem.OEMCSFragDatabase()
    # use the options from the frag database to fragment an arbitrary input molecule
    print('MoleculeToCores using default fragment database options:')
    sortedcores = sorted([c for c in fragdb.MoleculeToCores(mol)])
    for corenum, core in enumerate(sortedcores):
        print('{}: {}'.format(corenum, core))
    # set the MCS fragment database options from the command-line arguments
    fragopts = oemedchem.OEMCSFragDatabaseOptions()
    if not oemedchem.OEConfigureMCSFragDatabaseOptions(itf):
        oechem.OEThrow.Fatal("Error configuring options")
    if not oemedchem.OESetupMCSFragDatabaseOptions(fragopts, itf):
        oechem.OEThrow.Fatal("Error setting options")

    # use the custom options to fragment an arbitrary input molecule
    print('MoleculeToCores using command-line options:')
    sortedcores = sorted([c for c in oemedchem.OEMoleculeToCores(mol, fragopts)])
    for corenum, core in enumerate(sortedcores):
        print('{}: {}'.format(corenum, core))

NumFragments

unsigned int NumFragments() const

Returns the total number of fragment cores present in the index.

NumMols

unsigned int NumMols() const

Returns the total number of indexed molecules present in the index.

ProcessMol

int ProcessMol(OEChem::OEMolBase &inmol)

Since the fragmentation engine used internally to generate the index modifies the input structures (eg discarding all but the largest fragment of the input), this method is provided to force the application of the indexing modifications on the input structure. This is generally useful for depiction or reporting of the returned similarity results.