OEFPDatabase

class OEFPDatabase

The OEFPDatabase class is designed to perform rapid in-memory fingerprint searches. Each OEFPDatabase object is associated with a fingerprint type (OEFPTypeBase) that is specified when the database is constructed. An OEFPDatabase object can only store fingerprints (OEFingerPrint) with this specified type.

See also

Constructors

OEFPDatabase(unsigned int fptype)

Creates an OEFPDatabase object that can store OEFingerPrint objects with a given type.

fptype

The type of the OEFingerPrint object stored in the OEFPDatabase. This value has to be from the OEFPType namespace.

OEFPDatabase(const OEGraphSim::OEFPTypeBase *)

Creates an OEFPDatabase object that can store OEFingerPrint objects with OEFPTypeBase type.

Note

By default, an OEFPDatabase object is constructed with:

These default values can be altered by the following methods:

AddFP

unsigned int AddFP(const OEChem::OEMolBase &mol)

Generates an OEFingerPrint object from the OEMolBase molecule with the fingerprint type of the OEFPDatabase. The generated OEFingerPrint object is then inserted into the database returning its index. This method will return -1, if the fingerprint generation was unsuccessful.

unsigned int AddFP(const OEGraphSim::OEFingerPrint &fp)

Creates a copy of the OEFingerPrint object, inserts it into the database, and then returns its index. If the type of the passed fingerprint was different from the type of the database, than the insertion is unsuccessful and this method will return -1.

Note

The index returned by the OEFPDatabase.AddFP method is a unique number starting from zero. This index can be used as a reference number to associate the fingerprint with the molecule from which is it generated.

ClearCutoff

void ClearCutoff()

Removes the cutoff value previously set by the OEFPDatabase.SetCutoff method. After clearing the cutoff value OEFPDatabase.HasCutoff method will return false.

GetCutoff

float GetCutoff() const

Returns the cutoff value previously set by the OEFPDatabase.SetCutoff method.

GetDescendingOrder

bool GetDescendingOrder() const

Returns the order in which the calculated scores are returned by the OEFPDatabase.GetSortedScores method.

GetFPTypeBase

const OEGraphSim::OEFPTypeBase *GetFPTypeBase() const

Returns the fingerprint type of the OEFPDatabase object.

GetFingerPrints

OESystem::OEIterBase<const OEGraphSim::OEFingerPrint> *GetFingerPrints() const

Returns an iterator pointer over fingerprints (OEFingerPrint) stored in the OEFPDatabase object. The returned OEFingerPrint objects can only be accessed by const methods or functions, i.e., they can not be modified.

GetScores

The overloaded versions of the GetScores method

Link

Description

GetScores(mol, bgn, end)

returns similarities for a molecule

GetScores(fp, bgn, end)

returns similarities for a fingerprint

GetScores(mol, opts)

returns similarities for a molecule with the given options

GetScores(fp, opts)

returns similarities for a fingerprint with the given options

Similarity calculation:

OESystem::OEIterBase<OEGraphSim::OESimScore> *
  GetScores(const OEChem::OEMolBase &mol, unsigned int bgn=0,
            unsigned int end=0) const
OESystem::OEIterBase<OEGraphSim::OESimScore> *
  GetScores(const OEGraphSim::OEFingerPrint &fp, unsigned int bgn=0,
            unsigned int end=0) const

Performs similarity calculation between a given molecule or a fingerprint and the fingerprints stored in the OEFPDatabase object. It returns an iterator over the calculated similarity scores (OESimScore). Each OESimScore holds a similarity score and index of the corresponding fingerprint of the database.

mol

If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.

fp

If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFPDatabase.

bgn, end

The bgn and end arguments define the segment of the database on which the similarity calculation will take place. If both of these parameters are omitted (or set to zero), then the similarity calculation is performed on the entire fingerprint database.

Note

Examples:

  • Calculates the Tanimoto similarity on the first 100 entries of the database and returns scores that are equal to or larger than 0.1.

    descending = True
    fpdb.SetSimFunc(oegraphsim.OESimMeasure_Tanimoto, descending)
    fpdb.SetCutoff(0.1)
    for score in fpdb.GetScores(qfp, 0, 100):
        print("%.3f" % score.GetScore())
    
  • Calculates the Tversky similarity (with \(\alpha=0.9\) and \(\beta=0.1\)) on the entire database and returns all scores.

    fpdb.SetSimFunc(oegraphsim.OETverskySim(0.9))
    for score in fpdb.GetScores(qfp):
        print("%.3f" % score.GetScore())
    
  • Calculates the Dice similarity beginning at the 100th entry of the database and returns scores that are equal to or smaller than 0.5.

    descending = True
    fpdb.SetSimFunc(oegraphsim.OESimMeasure_Dice, not descending)
    fpdb.SetCutoff(0.5)
    for score in fpdb.GetScores(qfp, 100):
        print("%.3f" % score.GetScore())
    

Similarity calculation with an option class:

Performs similarity calculation between a given molecule or a fingerprint and the fingerprints stored in the OEFPDatabase object. It returns an iterator over the calculated similarity scores (OESimScore). Each OESimScore holds a similarity score and index of the corresponding fingerprint of the database.

mol

If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.

fp

If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFPDatabase.

opts

The OEFPDatabaseOptions object controls all the parameters that determine the sorted search. For example, similarity measure, score cutoff, descending or ascending order etc.

See also

Example:

  • Calculates the Cosine similarity on the entire database and returns scores that are equal to or larger than 0.3.

    opts = oegraphsim.OEFPDatabaseOptions()
    opts.SetCutoff(0.3)
    opts.SetSimFunc(oegraphsim.OESimMeasure_Cosine)
    for score in fpdb.GetScores(qfp, opts):
        print("%.3f" % score.GetScore())
    

GetSortedScores

The overloaded versions of the GetSortedScores method

Link

Description

GetSortedScores(mol, limit, bgn, end)

returns sorted similarities for a molecule

GetSortedScores(fp, limit, bgn, end)

returns sorted similarities for a fingerprint

GetSortedScores(mol, opts)

returns sorted similarities for a molecule with the given options

GetSortedScores(fp, opts)

returns similarities for a fingerprint with the given options

Similarity calculation:

OESystem::OEIterBase<OEGraphSim::OESimScore> *
  GetSortedScores(const OEChem::OEMolBase &mol, unsigned int limit=0,
                  unsigned int bgn=0, unsigned int end=0) const
OESystem::OEIterBase<OEGraphSim::OESimScore> *
  GetSortedScores(const OEGraphSim::OEFingerPrint &fp, unsigned int limit=0,
                  unsigned int bgn=0, unsigned int end=0) const

Performs similarity calculations between a molecule or a fingerprint and the fingerprints stored in the OEFPDatabase object. It returns an iterator over the calculated similarity scores (OESimScore) in sorted order. Each OESimScore holds a similarity score and index of the corresponding fingerprint of the database.

mol

If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.

fp

If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFPDatabase.

bgn, end

The bgn and end arguments define the segment of the database on which the similarity calculation will take place. If both of these parameters are omitted (or set to zero), then the similarity calculation is performed on the entire fingerprint database.

limit

The value that defines the number of similarity scores returned by the OEFPDatabase.GetSortedScores method. If it is omitted (or set to zero) then all of the similarity scores are returned.

Note

Examples:

  • Calculates the Tanimoto similarity on the entire database and returns the 10 best scores in descending order.

    for score in fpdb.GetSortedScores(qfp, 10):
        print("%.3f" % score.GetScore())
    
  • Calculates the Dice similarity on the first 100 entries of the database and returns scores that are equal to or larger than 0.5 in descending order.

    descending = True
    fpdb.SetSimFunc(oegraphsim.OESimMeasure_Dice, descending)
    fpdb.SetCutoff(0.5)
    for score in fpdb.GetSortedScores(qfp, 0, 0, 100):
        print("%.3f" % score.GetScore())
    
  • Calculates Manhattan similarity beginning at the 100th entry of the database and returns the “worst” 5 scores that are equal to or smaller than 0.3 in ascending order.

    descending = True
    fpdb.SetSimFunc(oegraphsim.OESimMeasure_Manhattan, not descending)
    fpdb.SetCutoff(0.3)
    for score in fpdb.GetSortedScores(qfp, 5, 100):
        print("%.3f" % score.GetScore())
    

Sorted similarity calculation with an option class:

OESystem::OEIterBase<OEGraphSim::OESimScore> *
  GetSortedScores(const OEChem::OEMolBase &mol, const OEFPDatabaseOption &opts) const
OESystem::OEIterBase<OEGraphSim::OESimScore> *
  GetSortedScores(const OEGraphSim::OEFingerPrint &fp, const OEFPDatabaseOption &opts) const

Performs similarity calculations between a molecule or a fingerprint and the fingerprints stored in the OEFPDatabase object. It returns an iterator over the calculated similarity scores (OESimScore) in sorted order. Each OESimScore holds a similarity score and index of the corresponding fingerprint of the database.

mol

If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.

fp

If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFPDatabase.

opts

The OEFPDatabaseOptions object controls all the parameters that determine the sorted search. For example, similarity measure, score cutoff, descending or ascending order etc.

See also

Example:

  • Calculates Tversky similarity on the entire database and returns the best 10 scores that are equal to or larger than 0.3 in descending order.

    opts = oegraphsim.OEFPDatabaseOptions()
    opts.SetDescendingOrder(True)
    opts.SetCutoff(0.3)
    opts.SetSimFunc(oegraphsim.OESimMeasure_Tversky)
    opts.SetTverskyCoeffs(0.9, 0.1)
    opts.SetLimit(10)
    for score in fpdb.GetSortedScores(qfp, opts):
        print("%.3f" % score.GetScore())
    

HasCutoff

bool HasCutoff() const

Returns whether the cutoff value of the OEFPDatabase object has been set by the OEFPDatabase.SetCutoff method.

NumFingerPrints

unsigned int NumFingerPrints() const

Returns the number of OEFingerPrint objects stored in the database.

SetCutoff

void SetCutoff(float)

Sets the cutoff value of the OEFPDatabase object. The cutoff value influences the behavior of both the OEFPDatabase.GetScores and the OEFPDatabase.GetSortedScores methods.

SetDescendingOrder

void SetDescendingOrder(bool descending)

Sets the order in which the calculated scores are returned by the OEFPDatabase.GetSortedScores method.

SetSimFunc

Sets the method used to evaluate fingerprint similarity when calling either the OEFPDatabase.GetScores or the OEFPDatabase.GetSortedScores methods.

void SetSimFunc(unsigned int simtype, bool descending=true)

Sets the similarity calculation by specifying a similarity method with a constant from the OESimMeasure namespace. The second argument defines the order in which the calculated scores are returned by the OEFPDatabase.GetSortedScores method.

void SetSimFunc(const OEGraphSim::OESimFuncBase &, bool descending=true)

Creates a copy of the OESimFuncBase object and uses its OESimFuncBase.operator() method to evaluate similarity between two OEFingerPrint objects. The second argument defines the order in which the calculated scores are returned by the OEFPDatabase.GetSortedScores method.

Note

By default, both the OEFPDatabase.GetScores and the OEFPDatabase.GetSortedScores methods calculate Tanimoto similarity scores. While the OEFPDatabase.GetScores always returns these scores in the order in which the corresponding OEFingerPrint objects are added to the database, the latter method returns them in descending order, by default.