Tutorial 2: How to prepare a database for faster load speeds

In this tutorial, you will learn how to pre-process a conformer database file for FastROCS TK, allowing for faster database load times with OEShapeDatabase::Open. Load times could be up to 10x faster. See the figure below for an eMolecules dataset of 14 million conformers.

../../../_images/Tutorial_2_Figure_1.png

Pre-Processing Performance Impact

To gain this extra loading performance, you need to use the following functions:

  1. OEPreserveRotCompress - this function works on the input molecule stream to ensure that rotor-offset compression is preserved during the preparation process. Rotor offset compression is a way of storing conformers as a set of torsions instead of storing the coordinates for every single conformer of a molecule. This optimization reduces the memory footprint of a multi-conformer molecule.

  2. OEPRECompress – this function works on the output molecule stream object allowing the molecules to be stored in a ‘pre-compressed’ format:

    • Writes rotor-offset-compressed molecules in the perfect-rotor-encoding format

    • There is no need to Gzip which means faster OEMolDatabase::Open.

  3. OEPrepareFastROCSMol – this function works on each OEMol record of the input.oeb:

    • Sets the energy of each conformer to 0.0 to avoid writing it to OEB.

    • Suppresses hydrogens and reorders reference conformers for compression.

    • Pre-calculates color atoms.

    • Pre-calculates self-color and self-shape terms for all conformers.

    Note

    The color terms cached by OEPrepareFastROCSMol are from the OEColorFFType::ImplicitMillsDean color force field. A different color force field can be given as the second argument to override ImplicitMillsDean.

In general, calling OEPrepareFastROCSMol and OEPRECompress will result in a smaller OEB file than the default OEB.GZ output from OMEGA.

Further reduction in file-size can be achieved by using an OEMCMolType::HalfFloatCartesian molecule to store reference coordinates and torsions as 16-bit floating point.

Here is some example code showing how to pre-process a database with OEPrepareFastROCSMol, save to a precompressed format, and reduce the file size by using half precision:

For added convenience, we have created a shapedatabaseprep.cpp example script which can be modified to meet your exact needs:

Download code

simpleprepscript.cpp