Tutorial 2: How to prepare a database for faster load speeds

In this tutorial, you will learn how to pre-process a conformer database file for FastROCS TK, allowing for faster database load times with OEShapeDatabase::Open. Load times could be up to 10x faster. See the figure below for an eMolecules dataset of 14 million conformers.


Pre-Processing Performance Impact

To gain this extra loading performance, you need to use the following functions:

  1. OEPRECompress – this function works on the output molecule stream object allowing the molecules to be stored in a ‘pre-compressed’ format:

    • Writes rotor-offset-compressed molecules in the perfect-rotor-encoding format
    • There is no need to Gzip which means faster OEMolDatabase::Open.
  2. OEPrepareFastROCSMol – this function woks on each OEMol record of the input.oeb:

    • Sets the energy of each conformer to 0.0 to avoid writing it to OEB.
    • Suppresses hydrogens and reorders reference conformers for compression.
    • Pre-calculates color atoms.
    • Pre-calculates self-color and self-shape terms for all conformers.


    The color terms cached by OEPrepareFastROCSMol are from the OEColorFFType::ImplicitMillsDean color force field. A different color force field can be given as the second argument to override ImplicitMillsDean.

In general, calling OEPrepareFastROCSMol and OEPRECompress will result in a smaller OEB file than the default OEB.GZ output from OMEGA.

Further reduction in file-size can be achieved by using an OEMCMolType::HalfFloatCartesian molecule to store reference coordinates and torsions as 16-bit floating point.

Here is some example code showing how to pre-process a database with OEPrepareFastROCSMol, save to a precompressed format, and reduce the file size by using half precision:

For added convenience, we have created a shapedatabaseprep.cpp example script which can be modified to meet your exact needs:

Download code