• Docs »
• OEShapeDatabase

# OEShapeDatabase¶

class OEShapeDatabase


This is the primary class for performing FastROCS TK calculations. It is a very heavy-weight object:

• consuming many gigabytes of memory
• managing all GPU interaction

The goal is to allow abstracting away the complexities as much as possible to allow writing to a single API, allowing for the improvement of the underlying compute engine over time.

Ideally, this class is initialized once per dataset. There is a fair amount of pre-calculation done on each molecule and conformer while being loaded into memory. Some of this can be alleviated by pre-calculating and caching, but not all of it, as there is a balance between caching and recalculation that is always being tuned.

## Constructors¶

OEShapeDatabase(const OEShape::OEColorForceField &cff)
OEShapeDatabase(unsigned int dbtype=OEShapeDatabaseType::Default,
unsigned int cfftype=OEShape::OEColorFFType::OEDefault)


Create a new OEShapeDatabase for managing conformers and performing FastROCS TK calculations.

Whether the OEShapeDatabase can perform color calculations must be chosen at construction. If “shape only” is chosen by passing OEShapeDatabaseType.Shape, then there can be significant memory and load-time performance improvements. Color atom assignment can be a significant cost during load and increase memory usage by roughly 2x.

A custom OEColorForceField can be passed to this constructor as well to allow color scoring to be completely customized. Note, in the case of a custom color force field, OEShapeDatabaseType.Default is assumed. It is currently not possible to perform a “color only” scoring.

unsigned int AddMol(const OEChem::OEMCMolBase &mol)


Add a new collection of conformers to this database and return the index used to identify this molecule. This index will start at 0 and monotonically increase by 1 for every multi-conformer molecule added. OEShapeDatabaseScore.GetMolIdx will return this index to be able to map FastROCS scores to molecules added through this mechanism.

Note

Even though this method is not const, it has been made thread-safe so that it can be called from multiple threads. Furthermore, it has been optimized as much as possible to parallelize the pre-calculation this method performs. This makes it very efficient to use multiple threads to load a database file into memory.

## GetColorForceFieldType¶

unsigned int GetColorForceFieldType() const


Return a constant from the OEColorFFType namespace to indicate the color force field used to construct this database. Returns OEColorFFType.Custom if a custom OEColorForceField object was passed to the constructor.

## GetColorGridSpacing¶

float GetColorGridSpacing() const


Return the grid spacing used to calculate color scores. This defaults to 0.5 for good performance. Lower values will yield answers that agree more closely with the Exact analytical calculation, at the expense of performance. Higher values can yield better performance. The default was chosen as a good balance of virtual screening statistical analysis (AUCs) versus raw throughput performance.

## GetDatabaseType¶

unsigned int GetDatabaseType() const


Returns a constant from the OEShapeDatabaseType namespace indicating what type of calculations this database can perform.

## GetMaxNumDevices¶

unsigned int GetMaxNumDevices() const


Returns the maximum number of GPU devices this database will use for calculations. The only way to restrict the GPUs seen by the database is to use the CUDA_VISIBLE_DEVICES environment variable before starting the process.

## GetMaxOptIterations¶

unsigned int GetMaxOptIterations() const


Return the number of optimizer iterations the FastROCS algorithm should use when optimizing the alignment of the database conformer and the query conformer. This currently defaults to 10 based upon analysis of producing good virtual screening statistics (AUCs) without excessive iterations that would cost performance.

## GetNumDevices¶

unsigned int GetNumDevices() const


Returns the number of GPU devices this database will use for calculation. This will default to all the GPUs that are visible, i.e., the value returned from OEShapeDatabase.GetMaxNumDevices.

unsigned int GetNumOpenThreads() const


Return how many CPU threads will be used to read a OEMolDatabase from disk into memory using the OEShapeDatabase.Open method. The default, a value of 0, is to use as many CPUs as can be found on the system with OEGetNumProcessors.

## GetScores¶

OESystem::OEIterBase<OEShapeDatabaseScore> *
GetScores(const OEChem::OEMolBase &query,
const OEShapeDatabaseOptions &options=OEShapeDatabaseOptions()) const
OESystem::OEIterBase<OEShapeDatabaseScore> *
GetScores(const OEShape::OEShapeQueryPublic &shapeQry,
const OEShapeDatabaseOptions &options=OEShapeDatabaseOptions()) const


Return ALL scores of the query against the entire database subject to the options specified in the OEShapeDatabaseOptions passed to this method. This is useful for performing larger scale NxN clustering type of calculations where all pairs of scores need to be processed.

The query can be either a single conformer OEMolBase, or a OEShapeQueryPublic object read from a .sq file.

Warning

The order of the OEShapeDatabaseScore returned by the iterator is non-deterministic and will certainly change for each execution due to the multi-threaded nature of this method. However, the values calculated in each OEShapeDatabaseScore will be the same. Therefore, users should rely on the return value of OEShapeDatabaseScore.GetMolIdx and OEShapeDatabaseScore.GetConfIdx to do further processing, not the location within the iterator.

The OEShapeDatabaseOptions class is used to control many of the parameters to this method. For example, how many conformers per molecule to return.

## GetShapeGridSpacing¶

float GetShapeGridSpacing() const


Return the grid spacing used to calculate shape scores and drive the alignment. This defaults to 0.5 for good performance. Lower values will yield answers that agree more closely with the Exact analytical calculation, at the expense of performance. Higher values can yield better performance. The default was chosen as a good balance of virtual screening statistical analysis (AUCs) versus raw throughput performance.

## GetSortedScores¶

OESystem::OEIterBase<OEShapeDatabaseScore> *
GetSortedScores(const OEChem::OEMolBase &query, unsigned int limit=0) const
OESystem::OEIterBase<OEShapeDatabaseScore> *
GetSortedScores(const OEChem::OEMolBase &query,
const OEShapeDatabaseOptions &options) const
OESystem::OEIterBase<OEShapeDatabaseScore> *
GetSortedScores(const OEShape::OEShapeQueryPublic &shapeQry,
const OEShapeDatabaseOptions &options=OEShapeDatabaseOptions()) const


Return a hitlist of the query against the database based upon the scoring options of the database and the OEShapeDatabaseOptions passed to this method. The OEShapeDatabaseScore will be returned in descending order, i.e., the better ‘hits’ will come first in the iterator.

The query can be either a single conformer OEMolBase, or a OEShapeQueryPublic object read from a .sq file.

Note

This method is typically used to select only a subset of the results based upon limit or OEShapeDatabaseOptions.SetLimit. It is optimized for rapidly constructing relatively small hitlists. If the entire set of scores for the database is desired, it can be faster to use the OEShapeDatabase.GetScores to avoid the sorting operation.

The OEShapeDatabaseOptions class is used to control many of the parameters to this method. For example, how many conformers per molecule to return.

## NumConfs¶

unsigned int NumConfs() const


Return the number of conformers the database is currently managing. Useful for getting a ballpark idea of the underlying memory usage.

Note

This value has no relation to the indexes returned by OEShapeDatabase.AddMol, except that this value will always be larger than the last index returned.

## Open¶

bool Open(const OEChem::OEMolDatabase &moldb,
const unsigned int orient=OEFastROCSOrientation::Default)
bool Open(const OEChem::OEMolDatabase &moldb,
const unsigned int orient=OEFastROCSOrientation::Default)


Initialize the database using a OEMolDatabase. This is the most efficient way to initialize a OEShapeDatabase as this method will launch an operating thread for each CPU core available and parallelize all the parsing and pre-calculation. The progress of the loading operation can be tracked through a thread-safe OEThreadedDots object.

If using alternative start method OEFastROCSOrientation.AsIs, this constant needs to be passed as the final argument to the Open routine so the database can be loaded without adjusting conformer coordinates. The final argument can be left blank for all other use cases.

Warning

Databases must be re-opened if using OEFastROCSOrientation.AsIs

This method will block and return true when the database has been successfully loaded into memory.

Note

The indices returned by OEShapeDatabaseScore.GetMolIdx are guaranteed to map directly into the OEMolDatabase index space. OEMolDatabase.GetMolecule can fail and return no molecule for good reason, i.e., an empty molecule from an SD file. Therefore, the index space used by OEShapeDatabase can have “holes” when initialized from a OEMolDatabase.

## PrintMemoryUsage¶

void PrintMemoryUsage(OEPlatform::oeostream &os) const;
void PrintMemoryUsage() const;


Print out memory usage statistics for this object. This will break down how much memory is being used to pre-cache various parts of the calculation. By default, the output will be written to OEPlatform.oeerr. The output stream can also be passed as an argument. The diagnostic output is meant for human consumption and may change format in future releases.

## SetColorGridSpacing¶

bool SetColorGridSpacing(float gridSpacing)


Set the grid spacing to use for static color scoring.

## SetMaxOptIterations¶

void SetMaxOptIterations(unsigned int maxIter)


Set the number of optimizer iterations to use when optimizing the alignment by shape.

## SetNumDevices¶

void SetNumDevices(unsigned int ndevices)


Set the number of GPU devices this calculation should use. This number should be between 1 and OEShapeDatabase.GetMaxNumDevices inclusively. This method is really only useful to efficiently collecting FastROCS scalability data across multiple GPUs. To restrict OEShapeDatabase to only run on a subset of GPUs on the machine, use the CUDA_VISIBLE_DEVICES environment variable instead before the process is launched.

void SetNumOpenThreads(unsigned int numThrds)

bool SetShapeGridSpacing(float gridSpacing)