OEFastFPDatabase

class OEFastFPDatabase

The OEFastFPDatabase class is designed to perform rapid CUDA-accelerated, in-memory or memory-mapped fingerprint searches using the popcount method. Each OEFastFPDatabase object is associated with a fingerprint type (OEFPTypeBase) that is set when the database is initialized from a pre-generated binary fingerprint file.

Note

GraphSim TK currently only supports the popcount search method for fingerprints with the size of multiple of 256. This means that the OEFastFPDatabase class currently does not support:

For the fingerprint types listed above, the original OEFPDatabase class can be utilized.

Note

OEFastFPDatabase gives identical results to OEFPDatabase. However OEFPDatabase calculates similarity scores in single precision (float) while OEFastFPDatabase uses double precision. As a result small similarity score differences can be observed.

See also

../../_images/searchfastfp.png

Schematic representation of fast fingerprint search process

Constructors

OEFastFPDatabase(const std::string &dbfile,
                 unsigned int memtype=OEFastFPDatabaseMemoryType::Default)

Constructs an OEFastFPDatabase object.

dbfile
The name of the file which contains the fingerprint data. The file has to be generated with the OECreateFastFPDatabaseFile function.
memtype
Defines whether the fingerprints are pre-loaded into GPU-memory, CPU-memory or memory-mapped during the search process. This value has to be from the OEFastFPDatabaseMemoryType namespace.

Note

If the OEFastFPDatabase object can not be initialized with the OEFastFPDatabaseMemoryType::CUDA option the following warning message will be throw: Warning: OEFastFPDatabase::OEFastFPDatabase() : no CUDA-enabled device available falling back to memory-mapped type! As a rule of thumb, 1 million finger prints requires 0.6GB of GPU memory. GPU memory can be queried using the nvidia-smi command from terminal.

See also

GetAllScores

OESystem::OEIterBase<float> *GetAllScores(const OEFPDatabaseOptions &opts) const

Performs \(NxN\) similarity calculations between all pairs of fingerprints stored in the OEFastFPDatabase object. It returns an iterator over the calculated similarity scores. The scores are not sorted, but returned as a flattened square matrix.

opts
The OEFPDatabaseOptions object controls all the parameters that determine the search i.e. similarity measure parameters. Cutoff and order parameters are ignored as the results are not filtered or sorted.

Note

This operation scales with \(n^2\) in memory and can easily overwhelm the system memory for larger databases. OEFastFPDatabase::GetRawScores when called with an integer index, will return a single row of the same matrix returned by this method and might be more efficient depending on the application.

Hint

For all similarity measures other than Tversky, the returned matrix will be symmetrical.

See also

GetFPTypeBase

const OEFPTypeBase *GetFPTypeBase() const

Returns the fingerprint type of the OEFastFPDatabase object. An OEFastFPDatabase object can only store fingerprints with identical type.

GetHistogram

Attention

PRELIMINARY-IMAGE This is a preliminary API until 2018.Oct and may be improved based on user feedback. It is currently available in C++ and Python.

OEFPHistogram *GetHistogram(const OEFPDatabaseOptions &opts,
                            size_t nrbins=200) const

Performs similarity calculations between all pairs of fingerprints stored in the OEFastFPDatabase object. It returns the histogram over the calculated similarity scores in a OEFPHistogram object.

opts
The OEFPDatabaseOptions object controls all the parameters that determine the search i.e. similarity measure, \(\alpha\) and \(\beta\) parameters for Tversky similarity. Cutoff and order parameters are ignored as the results are not filtered or sorted.
nrbins
Number of bins in the returned OEFPHistogram object.

Note

For all similarity measures other than Tversky, the histogram will only contain upper-triangular similarity scores (excluding the diagonal). In case of the asymmetric Tversky similarity measure, the histogram for the whole \(NxN\) matrix is returned.

Note

When the OEFastFPDatabase object is initialized with OEFastFPDatabaseMemoryType::CUDA, nrbins is limited to at most 1024.

Hint

This method calculates similarities identically to OEFastFPDatabase::GetAllScores but is not bound by system memory. It can be used to quickly obtain statistics on larger databases.

See also

GetMemoryType

unsigned int GetMemoryType() const

Returns the memory type of the fingerprint database. The return value is taken from the OEFastFPDatabaseMemoryType namespace.

GetMemoryTypeString

std::string GetMemoryTypeString() const

Returns the string representation if memory type of the fingerprint database.

GetRawScores

OESystem::OEIterBase<double> *GetRawScores(const size_t fpidx,
                                           const OEFPDatabaseOptions &opts) const
OESystem::OEIterBase<double> *GetRawScores(const OEFingerPrint &fp,
                                           const OEFPDatabaseOptions &opts) const
OESystem::OEIterBase<double> *GetRawScores(const OEChem::OEMolBase &mol,
                                           const OEFPDatabaseOptions &opts) const

Performs similarity calculations between a molecule or a fingerprint and the fingerprints stored in the OEFastFPDatabase object. It returns an iterator over the calculated similarity scores. The scores are not sorted, but returned in the same order as the database. The number of elements in the returned iterator is equal to the number of fingerprints in the database.

fpidx
If the method is called with an integer index, the query fingerprint is taken from the OEFastFPDatabase object with the given index.
mol
If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.
fp
If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFastFPDatabase.
opts
The OEFPDatabaseOptions object controls all the parameters that determine the search i.e. similarity measure parameters. Cutoff and order parameters are ignored as the results are not filtered or sorted.

GetScores

OESystem::OEIterBase<OESimScore> *GetScores(const size_t idx,
                                            const OEFPDatabaseOptions &opts) const
OESystem::OEIterBase<OESimScore> *GetScores(const OEFingerPrint &fp,
                                            const OEFPDatabaseOptions &opts) const
OESystem::OEIterBase<OESimScore> *GetScores(const OEChem::OEMolBase &mol,
                                            const OEFPDatabaseOptions &opts) const

Performs similarity calculations between a molecule or fingerprint and the fingerprints stored in the OEFastFPDatabase object. It returns an iterator over the calculated similarity scores (OESimScore). The results are filtered according the cutoff and order specified in opts, but not sorted.

fpidx
If the method is called with an integer index, the query fingerprint is taken from the OEFastFPDatabase object with the given index.
mol
If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.
fp
If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFastFPDatabase.
opts
The OEFPDatabaseOptions object controls all the parameters that determine the search i.e. similarity measure parameters, score cutoff and order.

Hint

If the order is descending (default) scores greater than cutoff will be returned, and vice versa.

See also

GetSortedScores

OESystem::OEIterBase<OESimScore> *GetSortedScores(const size_t idx,
                                                  const OEFPDatabaseOptions &opts) const
OESystem::OEIterBase<OESimScore> *GetSortedScores(const OEFingerPrint &fp,
                                                  const OEFPDatabaseOptions &opts) const
OESystem::OEIterBase<OESimScore> *GetSortedScores(const OEChem::OEMolBase &mol,
                                                  const OEFPDatabaseOptions &opts) const

Performs similarity calculations between a molecule or fingerprint and the fingerprints stored in the OEFastFPDatabase object. It returns an iterator over the calculated similarity scores (OESimScore) in sorted order. Each OESimScore holds a similarity score and index of the corresponding fingerprint of the database.

fpidx
If the method is called with an integer index, the query fingerprint is taken from the OEFastFPDatabase object with the given index.
mol
If the method is called with an OEMolBase object, then a fingerprint is generated from this molecule before looping over the fingerprints of the database and calculating similarities.
fp
If the method is called with an OEFingerPrint object, then its type has to match with the type of the OEFastFPDatabase.
opts
The OEFPDatabaseOptions object controls all the parameters that determine the search i.e. similarity measure parameters, score cutoff, order and limit.

See also

GetVariogram

Attention

PRELIMINARY-IMAGE This is a preliminary API until 2018.Oct and may be improved based on user feedback. It is currently available in C++ and Python.

OEGraphSim::OEFPVariogram *GetVariogram(const std::vector<float>& obsdata,
                                        const OEFPDatabaseOptions &opts,
                                        const size_t nrbins=200) const

Performs similarity calculations between all pairs of fingerprints stored in the OEFastFPDatabase object. It returns the empirical variogram over calculated similarity scores with respect to the measurements provided in the obsdata parameter, in a OEFPVariogram object.

obsdata
User-provided empirical measurements for each fingerprint in the database.
opts
The OEFPDatabaseOptions object controls all the parameters that determine the scoring i.e. similarity measure parameters. Cutoff and order parameters are ignored as the results are not filtered or sorted.
nrbins
Number of bins in the returned OEFPVariogram object.

Note

The empirical variogram is defined over distances rather than similarities. It is therefore not possible to calculate a variogram using Tversky similarity. For all other similarity measures, empirical variogram is calculated using \(distance = 1-similarity\).

Hint

This method calculates similarities identically to OEFastFPDatabase::GetAllScores but is not bound by system memory. It can be used to quickly obtain statistics on larger databases.

Note

When the OEFastFPDatabase object is initialized with OEFastFPDatabaseMemoryType::CUDA, nrbins is limited to at most 1024.

Hint

The measurement in the context of a variogram can be any user provided floating point values per fingerprint.

Hint

The returned OEFPVariogram object also contains a histogram, but note that this histogram is over distances rather than similarities.

See also

IsValid

bool IsValid() const

Returns whether the database was initialized correctly.

NumFingerPrints

size_t NumFingerPrints() const

Returns the number of OEFingerPrint objects stored in the database.