In this tutorial, you will learn how to pre-process a conformer database file for FastROCS TK, allowing for faster database load times with OEShapeDatabase.Open. Load times could be up to 10x faster. See the figure below for an eMolecules dataset of 14 million conformers.
To gain this extra loading performance, you need to use the following functions:
OEPRECompress – this function works on the output molecule stream object allowing the molecules to be stored in a ‘pre-compressed’ format:
- Writes rotor-offset-compressed molecules in the perfect-rotor-encoding format
- There is no need to Gzip which means faster OEMolDatabase.Open.
- Sets the energy of each conformer to 0.0 to avoid writing it to OEB.
- Suppresses hydrogens and reorders reference conformers for compression.
- Pre-calculates color atoms.
- Pre-calculates self-color and self-shape terms for all conformers.
Further reduction in file-size can be achieved by using an OEMCMolType_HalfFloatCartesian molecule to store reference coordinates and torsions as 16-bit floating point.
Here is some example code showing how to pre-process a database with OEPrepareFastROCSMol, save to a precompressed format, and reduce the file size by using half precision:
For added convenience, we have created a ShapeDatabasePrep.py example script which can be modified to meet your exact needs: