FastROCS CLI Quick Start¶
Requirements¶
Python 3.7 or higher. We recommend starting with a clean conda environment.
Access to OpenEye’s Python package server, Magpie. If you are a licensed MaaS user and don’t have access, please contact OpenEye Support.
Installing FastROCS Client¶
First generate a Python 3 environment and then install FastROCS Client from our private PyPI server.
(myvirtualenv) > pip install -i https://magpie.eyesopen.com/pypi/ openeye-fastrocs-client
Below are examples of how to quickly get started using FastROCS Client.
Authenticating with FastROCS¶
To get started you will need to use the command line interface to configure a FastROCS profile
(myvirtualenv) > frcli --profile default config profile
Note
The --profile
flag will default to default
, you can use it to define
multiple profiles.
To see the current settings that your profile contains you can run
(myvirtualenv) > frcli --profile default config info
You can also list the different profiles that you have locally.
(myvirtualenv) > frcli config list
Note
All commands shown below will omit the --profile
option,
presuming the default profile is configured as above.
Check server status¶
$ frcli status
{
"datasets": {
"count": 22,
"status": {
"downloading": 0,
"failed": 0,
"loaded": 22,
"loading": 0,
"queued": 0
}
},
"gpus": [
{
"driver": "410.93",
"id": 0,
"load": 0.0,
"memoryUsed": 52.0,
"name": "GeForce GTX 1050",
"totalMemory": 1991.0
},
{
"driver": "410.93",
"id": 1,
"load": 0.0,
"memoryUsed": 123.0,
"name": "GeForce GTX 1080",
"totalMemory": 8119.0
}
],
"queries": {
"count": 6,
"queue": {
"queued": [],
"running": null
},
"queued": 0
},
"system": {
"disk": {
"free": 619748732928,
"percent": 33.6,
"total": 982840827904
},
"memory": {
"free": 29836935168,
"percent": 11.6,
"total": 33744871424
}
},
"version": "1.0.10",
"oe_license_expires": "2019-03-02"
}
FastROCS CLI Examples¶
FastROCS has two basic data types, datasets and queries. In order to do a search, we must add one or more datasets.
Managing Datasets¶
Currently, only users marked as FastROCS staff/administrators can add/modify/delete datasets.
Datasets should be multi-conformer files, created first by running Omega, then by processing with the ShapeDatabasePrep.py script (also installed in the current virtualenv).
All options for the dataset
sub-command:
$ frcli dataset
Usage: frcli dataset [OPTIONS] COMMAND [ARGS]...
Manage fastrocs datasets
Options:
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
Commands:
add Add a new dataset to fastrocs
delete Delete a fastrocs dataset
info Get info about FastROCS dataset
list List fastrocs datasets
serve Add a new dataset to FastROCS by using a...
update Update a fastrocs dataset
Assuming we’ve created a file called emolecules.oeb.gz with Omega, we convert to a FastROCS dataset with no more than 10 conformers per molecule:
$ ShapeDatabasePrep.py emolecules.oeb.gz emolecules_fastrocs.oeb 10
With the prepped database in hand, we can add it to FastROCS. There are a couple of ways to add, depending on the size of the file.
For files of 1-2M molecules, we can use the add command directly with the filename.
$ frcli dataset add -h
Usage: frcli dataset add [OPTIONS] <dataset_file>
Add a new dataset to fastrocs
Options:
--name NAME Unique name for this dataset
--color-force-field CFF Color force field to use to prep this dataset
--public Mark this dataset as public, for all users to see.
[default: False]
--system Mark this dataset as a system database. Requires
administrator permissions. [default: False]
--shape-only Use shape only, no color for scoring. [default:
False]
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
Unless the dataset is for the current user only, consider passing --public
.
Any dataset can also be marked --system
(which also requires --public
). Datasets
marked as system will be sorted to the top of the list in the UI.
Datasets derive their name by default from the filename, but a more human-readable
name can be passed via --name NAME
.
$ frcli dataset add emolecules_fastrocs.oeb
id name user status system public color_force_field shape_only
---- ------------------- ------ -------- -------- -------- ------------------- ------------
152 emolecules_fastrocs 2 QUEUED False False ImplicitMillsDean False
Note that both system and public are False, indicating that this dataset is only going to be visible/searchable by the current user.
To see the list of currently loaded datasets:
$ frcli dataset list
id name user status system public num_mols num_confs color_force_field shape_only
---- ------------------- ------ -------- -------- -------- ---------- ----------- ------------------- ------------
152 emolecules_fastrocs 2 LOADED False False 1001 3698 ImplicitMillsDean False
Instead of passing a local filename, add can also take a URL from which to download the dataset. For larger files, using a URL, allowing the server to pull the dataset instead of trying to push the large file into the server.
If you have local server that can serve the file, you can also use:
$ frcli dataset add http://example.org/files/emolecules_fastrocs.oeb
id name user status system public color_force_field shape_only
---- ------------------- ------ -------- -------- -------- ------------------- ------------
152 emolecules_fastrocs 2 QUEUED False False ImplicitMillsDean False
This server can be a permanent web server, a temporary server created for example with Python’s
http.server module, or via an alternate command in frcli dataset
.
The serve
command allows the local machine to create a temporary web server and then send
the appropriate URL to FastROCS to download from.
In the help, you can see that frcli dataset serve
has the same dataset creation
options as frcli dataset add
, but adds --host
and --port
.
$ frcli dataset serve -h
Usage: frcli dataset serve [OPTIONS] <dataset_file>
Add a new dataset to FastROCS by using a local http server
Options:
--host HOST Hostname or URL the fastrocs server will use. If
not sure, use your IP address. [required]
--port PORT Port to run local server on. [default: 8887]
--name NAME Unique name for this dataset
--color-force-field CFF Color force field to use to prep this dataset
--public Mark this dataset as public, for all users to see.
[default: False]
--system Mark this dataset as a system database. Requires
administrator permissions. [default: False]
--shape-only Use shape only, no color for scoring. [default:
False]
--timeout INTEGER Time in seconds that the server will wait for a
request from the fastrocs server [default: 60]
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
Note
If the local machine has a firewall in place, the port used for frcli dataset serve
must be
open.
$ frcli dataset serve --host 10.44.20.43 emolecules_fastrocs.oeb --name emols --public --system
uploading[################################] 598683/598683 - 00:00:00
id name user status system public color_force_field shape_only
---- ------ ------ ----------- -------- -------- ------------------- ------------
153 emols 2 DOWNLOADING True True ImplicitMillsDean False
To toggle a dataset’s visibility or give it a new name, we can use update
. In the example
below, 153 is the ID of the dataset we created earlier.
$ frcli dataset update --public --name "eMolecules" 153
id name user status system public num_mols num_confs color_force_field shape_only
---- ---------- ------ -------- -------- -------- ---------- ----------- ------------------- ------------
153 eMolecules 2 LOADED False True 1001 3698 ImplicitMillsDean False
To get info on an existing dataset, use info
.
$ frcli dataset info 153 --json
{
"color_force_field": "ImplicitMillsDean",
"id": 153,
"name": "eMolecules",
"num_confs": 3698,
"num_mols": 1001,
"public": true,
"shape_only": false,
"status": "LOADED",
"system": false,
"user": "2"
}
And finally, datasets can be deleted.
$ frcli dataset delete 153
Are you sure you want to delete this dataset? (y/N) y
Deleted Dataset 153
Adding Orion files as Datasets¶
When FastROCS is installed in Orion, it can pull files from Orion using a special URI scheme.
First, use the Orion Client command line to get the ID of an Orion file to add to FastROCS.
Note that these should be files produced from ShapeDatabasePrep and should be stored as .oeb not .oeb.gz
$ ocli files list
id owner name state created multipart project size (MB)
----- ------- -------------------------------------------- ------- ---------------- ----------- --------- -----------
24344 70 enamine_25M.oeb ready 2018-09-18 12:15 True 4993 34710.9
24307 70 enamine_10M.oeb ready 2018-09-18 11:27 True 4993 14003.2
24306 70 enamine_5M.oeb ready 2018-09-18 10:30 True 4993 6979.66
To add the enamine_25M.oeb file, make a URI including its Orion id:
$ frcli dataset add "orion://files/24344" --name "Enamine 25M" --system --public
id name user status system public color_force_field shape_only
---- ----------- ------ ----------- -------- -------- ------------------- ------------
1995 Enamine 25M 2 DOWNLOADING True True ImplicitMillsDean False
Running Queries¶
Queries are managed in a similar fashion. There is a top-level sub-command.
$ frcli query
Usage: frcli query [OPTIONS] COMMAND [ARGS]...
Manage FastROCS queries
Options:
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
Commands:
run Run a new fastrocs query.
add Add a new fastrocs query
delete Delete a fastrocs query
download Retrieve FastROCS query to local file
info Get info about FastROCS query
list List FastROCS queries
results Retrieve results of a FastROCS query
To do a search there are two main commands: run
and add
. The first
is a synchronous search, that submits the query, monitors the status and
downloads the results all in one step. Add is used to asynchronously
add several queries at once, then using the results
sub-command to download
the results later.
In all the cases below, the query is a local molecule file and the dataset ID is
the ID returned by frcli dataset list
.
$ frcli query run -h
Usage: frcli query run [OPTIONS] <query> <dataset_id> <output>
Run a new fastrocs query.
Upload a new query, poll continuously while it is running and when
complete, download the results into <output>
Options:
--num-hits N Number of hits to return [default: 250]
--shape-only Use shape only, no color for scoring.
--tversky-alpha ALPHA Alpha coefficient to use for Tversky scoring.
[default: 0.95]
--sim-type SIM Similarity score. [default: tanimoto]
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
Running a query is as easy as:
$ frcli dataset info 155
id name user status system public num_mols num_confs color_force_field shape_only
---- ------------ ------ -------- -------- -------- ---------- ----------- ------------------- ------------
155 eMolecules1M 2 LOADED True True 1000001 4791117 ImplicitMillsDean False
$ frcli query run query.sdf 155 hits.sdf
Running[################################] 4791117/4791117 - 00:00:01
id name status user dataset_id num_hits sim_type tversky_alpha shape_only output_format
---- --------- --------- ------ ------------ ---------- ---------- --------------- ------------ ---------------
113 4cox-ligD COMPLETED 2 155 250 Tanimoto 0 False sdf
The alternative is to manage the add -> info -> results steps manually
$ frcli query add -h
Usage: frcli query add [OPTIONS] <query> <dataset_id>
Add a new fastrocs query
Options:
--output-format FORMAT Format for output file. [default: oeb]
--num-hits N Number of hits to return [default: 250]
--shape-only Use shape only, no color for scoring.
--tversky-alpha ALPHA Alpha coefficient to use for Tversky scoring.
[default: 0.95]
--sim-type SIM Similarity score. [default: tanimoto]
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
$ frcli query results -h
Usage: frcli query results [OPTIONS] <query_id> <output>
Retrieve results of a FastROCS query
Options:
-y, --yes Say yes to any yes/no questions. [default: False]
--json Return JSON instead of simple text on stdout.
-q, --quiet Minimal output
-v, --verbose Verbose output
-h, --help Show this message and exit.
Here is a complete example:
$ frcli query add query.sdf 155 --output-format sdf
id name user dataset_id num_hits sim_type tversky_alpha shape_only output_format
---- --------- ------ ------------ ---------- ---------- --------------- ------------ ---------------
114 4cox-ligD 2 155 250 Tanimoto 0 False sdf
$ frcli query info 114
id name status user dataset_id num_hits sim_type tversky_alpha shape_only output_format
---- --------- --------- ------ ------------ ---------- ---------- --------------- ------------ ---------------
114 4cox-ligD COMPLETED 2 155 250 Tanimoto 0 False sdf
# once status == COMPLETED, get the results
$ frcli query results 114 hits.sdf
$ molcount.py hits.sdf
hits.sdf contains 250 molecule(s).
===========================================================
Total 250 molecules
All queries and results are stored on the server, but they can be cleaned up.
$ frcli query delete 114
Are you sure you want to delete this query? (y/N) y
Deleted Query 114