FastROCS Architecture

FastROCS vs ROCS

OpenEye makes available two separate shape based virtual screening technologies:

ROCS:Thousands of conformers per second
FastROCS:Millions of conformers per second

The ROCS technology is also embodied in the Shape TK. The Shape TK is a very flexible toolkit for experimenting with molecular shape and different ways that it can be applied. FastROCS is also deployed as a toolkit, but with a very different goal in mind: raw performance. In order to achieve this raw performance GPU hardware is utilized. The use of GPUs required the complete re-write of the ROCS algorithm to work well on GPUs. Furthermore, the opportunity was taken to completely rethink all trade-offs the algorithm makes in preference to speed. Therefore, it is expected that ROCS and FastROCS will output slightly different results.

In order for FastROCS to achieve a throughput of millions of conformers per second the entire informatics base of shape comparison had to be reimagined. FastROCS operates on millions of conformers in memory, in parallel, all at once. Whereas, the Shape TK will typically only operate on a single molecule at a time. Therefore, FastROCS is designed as a web service to serve shape queries from a database pre-loaded into memory: ShapeDatabaseServer.py example. The ShapeDatabaseServer.py is the best place for new users to start.

The ShapeDatabaseServer.py example is built around the most important API in FastROCS TK, the OEShapeDatabase. The OEShapeDatabase class is built to be a natural extension of the OEMolDatabase class in OEChem TK. The OEShapeDatabase object will manage the storage of conformers in memory and interacting with any GPU in the system. Theoretically, the user of OEShapeDatabase should not have to know that their calculation is being performed on a GPU.

Note

For batch processing operations where databases are brought into memory temporarily, the BestShapeOverlay.py is a better first script to read and alter.

Memory Usage

A general “rule of thumb” is 1 GB of main memory for every 4 million conformers. The primary variable FastROCS TK users need to think about is how many conformers per molecule is needed. The OMEGA default of 200 conformers per molecule is much higher than it needs to be due to OEDocking vs ROCS virtual screening performance. OEDocking needs a higher number of conformers than ROCS does. ROCS can perform quite well with 10-50 conformers per molecule. The number of conformers chosen has a direct linear relationship to FastROCS query performance and the amount of memory needed.

Note

The “rule of thumb” is based upon the number of heavy atoms being normally distributed around 25-30 heavy atoms. The number of heavy atoms also has a linear relationship to the memory consumption. So fragment databases will be significantly smaller, but large macro-cycle peptide databases will be significantly larger.

Color scoring does significantly increase virtual screening performance. As much color information, centroids and self terms, is cached in memory as well. For doing pure OEShapeDatabaseType::Shape analysis, color information can be elided allowing for a small percentage in memory size reduction, around 5-10%. For a more complete breakdown of memory usage, use the OEShapeDatabase::PrintMemoryUsage method to print statitics on memory usage.

Note

Color atom and self term calculation is a dominating factor in the how fast FastROCS can load molecules from disk. Therefore, it is heavily recommended to pre-process OEB files with OEPrepareFastROCSMol.

It is important to note that FastROCS is most sensitive to host main memory, not GPU memory. On board GPU memory size does not heavily affect FastROCS performance, so buying the cheaper GPUs with less memory is just fine for FastROCS.

Note

This memory overhead will be improved over time. Significant advances were made in version 1.4.0, memory consumption per conformer was cut in half. Then cut in half again for the 1.7.0 release.

Multi-GPU Scaling

FastROCS will automatically scale across as many GPUs as can be found on the current machine. It is not uncommon or overly expensive to get a machine with 4-8 GPUs in it. As with any parallel program, truely optimal linear scaling is difficult to achieve, but FastROCS gets fairly close as can be seen in the below graph.

../_images/FastROCSScalingOnTeslaGenerations.png

This graph is a comparison of the various generations of Tesla compute cards produced by NVidia for high performance computing purposes.

Note

Static color calculations are still performed on the CPU. For best performance, a decent mid-range CPU with 2-cores per GPU is recommended.

The OEShapeDatabase class will by default automatically scale across all GPUs and CPUs in the system. In this way it can be a very large resource hog for CPU, GPU, and memory. There are some methods provided on the class to control resource utilization, but the current best control is through the CUDA_VISIBLE_DEVICES environment variable.

Multi-Node Scaling

Scaling FastROCS across multiple nodes is possible with the included ShapeDatabaseProxy.py example.