FastROCS TK 1.3.0

Major performance improvement in how FastROCS scales to multiple GPUs. The NVidia 275, 285, 295 driver introduced a severe performance regression that hampered FastROCS ability to scale across multiple GPUs. The solution was to use a different strategy to transfer data between the host and the device as well as scaling back the number of host threads launching jobs on the GPUs to only 1 per GPU.
Many additions to the python interface to allow tweaking how FastROCS is parallelized and memory is moved around in the system: SetNumDevices, SetHostToDeviceStrategy, SetDeviceToHostStrategy, SetNumThreadsPerDevice. The defaults should be reasonable for everyone while using the 295 NVidia driver. They’re left here to allow easy tweaking on future drivers, though they may be removed once a toolkit version is released, so don’t rely on them.
Fixed timer restart in OEShapeDatabaseOptions that would cause a reused OEShapeDatabaseOptions object to include time spent on previous searches.
Only 4 decimal places are stored in the SD data fields for Shape Tanimoto, Color Tanimoto, and Tanimoto Combo.
Made the NVidia driver install portion of the README easier to follow.
Removed the restriction to use molecules with less than 3 heavy atoms.