Tutorial 2: How to prepare a database for faster load speeds¶
In this tutorial, you will learn how to pre-process a conformer database
file for FastROCS TK, allowing for faster database load times with
OEShapeDatabase::Open
. Load times could be up to 10x
faster. See the figure below for an eMolecules dataset of 14 million
conformers.
To gain this extra loading performance, you need to use the following functions:
OEPreserveRotCompress
- this function works on the input molecule stream to ensure that rotor-offset compression is preserved during the preparation process. Rotor offset compression is a way of storing conformers as a set of torsions instead of storing the coordinates for every single conformer of a molecule. This optimization reduces the memory footprint of a multi-conformer molecule.OEPRECompress
– this function works on the output molecule stream object allowing the molecules to be stored in a ‘pre-compressed’ format:Writes rotor-offset-compressed molecules in the perfect-rotor-encoding format
There is no need to Gzip which means faster
OEMolDatabase::Open
.
OEPrepareFastROCSMol
– this function works on each OEMol record of the input.oeb:Sets the energy of each conformer to 0.0 to avoid writing it to OEB.
Suppresses hydrogens and reorders reference conformers for compression.
Pre-calculates color atoms.
Pre-calculates self-color and self-shape terms for all conformers.
Note
The color terms cached by
OEPrepareFastROCSMol
are from theOEColorFFType::ImplicitMillsDean
color force field. A different color force field can be given as the second argument to override ImplicitMillsDean.
In general, calling OEPrepareFastROCSMol
and
OEPRECompress
will result in a smaller OEB file
than the default OEB.GZ output from OMEGA.
Further reduction in file-size can be achieved by using an
OEMCMolType::HalfFloatCartesian
molecule to store
reference coordinates and torsions as 16-bit floating point.
Here is some example code showing how to pre-process a database with
OEPrepareFastROCSMol
, save to a precompressed
format, and reduce the file size by using half precision:
For added convenience, we have created a shapedatabaseprep.cpp example script which can be modified to meet your exact needs:
Download code