FastROCS CLI Quick Start¶

Requirements¶

Python 3.7 or higher. We recommend starting with a clean conda environment.

Access to OpenEye’s Python package server, Magpie. If you are a licensed MaaS user and don’t have access, please contact OpenEye Support.

Installing FastROCS Client¶

First generate a Python 3 environment and then install FastROCS Client from our private PyPI server.

(myvirtualenv) > pip install -i https://magpie.eyesopen.com/pypi/ openeye-fastrocs-client

Below are examples of how to quickly get started using FastROCS Client.

Authenticating with FastROCS¶

To get started you will need to use the command line interface to configure a FastROCS profile

(myvirtualenv) > frcli --profile default config profile

Note

The --profile flag will default to default, you can use it to define multiple profiles.

To see the current settings that your profile contains you can run

(myvirtualenv) > frcli --profile default config info

You can also list the different profiles that you have locally.

(myvirtualenv) > frcli config list

Note

All commands shown below will omit the --profile option, presuming the default profile is configured as above.

Check server status¶

$ frcli status
{
    "datasets": {
        "count": 22,
        "status": {
            "downloading": 0,
            "failed": 0,
            "loaded": 22,
            "loading": 0,
            "queued": 0
        }
    },
    "gpus": [
        {
            "driver": "410.93",
            "id": 0,
            "load": 0.0,
            "memoryUsed": 52.0,
            "name": "GeForce GTX 1050",
            "totalMemory": 1991.0
        },
        {
            "driver": "410.93",
            "id": 1,
            "load": 0.0,
            "memoryUsed": 123.0,
            "name": "GeForce GTX 1080",
            "totalMemory": 8119.0
        }
    ],
    "queries": {
        "count": 6,
        "queue": {
            "queued": [],
            "running": null
        },
        "queued": 0
    },
    "system": {
        "disk": {
            "free": 619748732928,
            "percent": 33.6,
            "total": 982840827904
        },
        "memory": {
            "free": 29836935168,
            "percent": 11.6,
            "total": 33744871424
        }
    },
    "version": "1.0.10",
    "oe_license_expires": "2019-03-02"
}

FastROCS CLI Examples¶

FastROCS has two basic data types, datasets and queries. In order to do a search, we must add one or more datasets.

Managing Datasets¶

Currently, only users marked as FastROCS staff/administrators can add/modify/delete datasets.

Datasets should be multi-conformer files, created first by running Omega, then by processing with the ShapeDatabasePrep.py script (also installed in the current virtualenv).

All options for the dataset sub-command:

$ frcli dataset
Usage: frcli dataset [OPTIONS] COMMAND [ARGS]...

  Manage fastrocs datasets

Options:
  -y, --yes      Say yes to any yes/no questions.  [default: False]
  --json         Return JSON instead of simple text on stdout.
  -q, --quiet    Minimal output
  -v, --verbose  Verbose output
  -h, --help     Show this message and exit.

Commands:
  add     Add a new dataset to fastrocs
  delete  Delete a fastrocs dataset
  info    Get info about FastROCS dataset
  list    List fastrocs datasets
  serve   Add a new dataset to FastROCS by using a...
  update  Update a fastrocs dataset

Assuming we’ve created a file called emolecules.oeb.gz with Omega, we convert to a FastROCS dataset with no more than 10 conformers per molecule:

$ ShapeDatabasePrep.py emolecules.oeb.gz emolecules_fastrocs.oeb 10

With the prepped database in hand, we can add it to FastROCS. There are a couple of ways to add, depending on the size of the file.

For files of 1-2M molecules, we can use the add command directly with the filename.

$ frcli dataset add -h
Usage: frcli dataset add [OPTIONS] <dataset_file>

  Add a new dataset to fastrocs

Options:
  --name NAME              Unique name for this dataset
  --color-force-field CFF  Color force field to use to prep this dataset
  --public                 Mark this dataset as public, for all users to see.
                           [default: False]
  --system                 Mark this dataset as a system database. Requires
                           administrator permissions.  [default: False]
  --shape-only             Use shape only, no color for scoring.  [default:
                           False]
  -y, --yes                Say yes to any yes/no questions.  [default: False]
  --json                   Return JSON instead of simple text on stdout.
  -q, --quiet              Minimal output
  -v, --verbose            Verbose output
  -h, --help               Show this message and exit.

Unless the dataset is for the current user only, consider passing --public.

Any dataset can also be marked --system (which also requires --public). Datasets marked as system will be sorted to the top of the list in the UI.

Datasets derive their name by default from the filename, but a more human-readable name can be passed via --name NAME.

$ frcli dataset add emolecules_fastrocs.oeb
  id  name                   user  status    system    public    color_force_field    shape_only
----  -------------------  ------  --------  --------  --------  -------------------  ------------
 152  emolecules_fastrocs       2  QUEUED    False     False     ImplicitMillsDean    False

Note that both system and public are False, indicating that this dataset is only going to be visible/searchable by the current user.

To see the list of currently loaded datasets:

$ frcli dataset list
  id  name                   user  status    system    public      num_mols    num_confs  color_force_field    shape_only
----  -------------------  ------  --------  --------  --------  ----------  -----------  -------------------  ------------
 152  emolecules_fastrocs       2  LOADED    False     False           1001         3698  ImplicitMillsDean    False

Instead of passing a local filename, add can also take a URL from which to download the dataset. For larger files, using a URL, allowing the server to pull the dataset instead of trying to push the large file into the server.

If you have local server that can serve the file, you can also use:

$ frcli dataset add http://example.org/files/emolecules_fastrocs.oeb
  id  name                   user  status    system    public    color_force_field    shape_only
----  -------------------  ------  --------  --------  --------  -------------------  ------------
 152  emolecules_fastrocs       2  QUEUED    False     False     ImplicitMillsDean    False

This server can be a permanent web server, a temporary server created for example with Python’s http.server module, or via an alternate command in frcli dataset.

The serve command allows the local machine to create a temporary web server and then send the appropriate URL to FastROCS to download from.

In the help, you can see that frcli dataset serve has the same dataset creation options as frcli dataset add, but adds --host and --port.

$ frcli dataset serve -h
Usage: frcli dataset serve [OPTIONS] <dataset_file>

  Add a new dataset to FastROCS by using a local http server

Options:
  --host HOST              Hostname or URL the fastrocs server will use. If
                           not sure, use your IP address.  [required]
  --port PORT              Port to run local server on.  [default: 8887]
  --name NAME              Unique name for this dataset
  --color-force-field CFF  Color force field to use to prep this dataset
  --public                 Mark this dataset as public, for all users to see.
                           [default: False]
  --system                 Mark this dataset as a system database. Requires
                           administrator permissions.  [default: False]
  --shape-only             Use shape only, no color for scoring.  [default:
                           False]
  --timeout INTEGER        Time in seconds that the server will wait for a
                           request from the fastrocs server  [default: 60]
  -y, --yes                Say yes to any yes/no questions.  [default: False]
  --json                   Return JSON instead of simple text on stdout.
  -q, --quiet              Minimal output
  -v, --verbose            Verbose output
  -h, --help               Show this message and exit.

Note

If the local machine has a firewall in place, the port used for frcli dataset serve must be open.

$ frcli dataset serve --host 10.44.20.43 emolecules_fastrocs.oeb --name emols --public --system
uploading[################################] 598683/598683 - 00:00:00
  id  name      user  status       system    public    color_force_field    shape_only
----  ------  ------  -----------  --------  --------  -------------------  ------------
 153  emols        2  DOWNLOADING  True      True      ImplicitMillsDean    False

To toggle a dataset’s visibility or give it a new name, we can use update. In the example below, 153 is the ID of the dataset we created earlier.

$ frcli dataset update --public --name "eMolecules" 153
  id  name          user  status    system    public      num_mols    num_confs  color_force_field    shape_only
----  ----------  ------  --------  --------  --------  ----------  -----------  -------------------  ------------
 153  eMolecules       2  LOADED    False     True            1001         3698  ImplicitMillsDean    False

To get info on an existing dataset, use info.

$ frcli dataset info 153 --json
{
    "color_force_field": "ImplicitMillsDean",
    "id": 153,
    "name": "eMolecules",
    "num_confs": 3698,
    "num_mols": 1001,
    "public": true,
    "shape_only": false,
    "status": "LOADED",
    "system": false,
    "user": "2"
}

And finally, datasets can be deleted.

$ frcli dataset delete 153
Are you sure you want to delete this dataset? (y/N) y
Deleted Dataset 153

Adding Orion files as Datasets¶

When FastROCS is installed in Orion, it can pull files from Orion using a special URI scheme.

First, use the Orion Client command line to get the ID of an Orion file to add to FastROCS.

Note that these should be files produced from ShapeDatabasePrep and should be stored as .oeb not .oeb.gz

$ ocli files list
  id    owner  name                                          state    created           multipart      project    size (MB)
-----  -------  --------------------------------------------  -------  ----------------  -----------  ---------  -----------
24344       70  enamine_25M.oeb                               ready    2018-09-18 12:15  True              4993     34710.9
24307       70  enamine_10M.oeb                               ready    2018-09-18 11:27  True              4993     14003.2
24306       70  enamine_5M.oeb                                ready    2018-09-18 10:30  True              4993      6979.66

To add the enamine_25M.oeb file, make a URI including its Orion id:

$ frcli dataset add "orion://files/24344" --name "Enamine 25M" --system --public
  id  name           user  status       system    public    color_force_field    shape_only
----  -----------  ------  -----------  --------  --------  -------------------  ------------
1995  Enamine 25M       2  DOWNLOADING  True      True      ImplicitMillsDean    False

Running Queries¶

Queries are managed in a similar fashion. There is a top-level sub-command.

$ frcli query
Usage: frcli query [OPTIONS] COMMAND [ARGS]...

  Manage FastROCS queries

Options:
  -y, --yes      Say yes to any yes/no questions.  [default: False]
  --json         Return JSON instead of simple text on stdout.
  -q, --quiet    Minimal output
  -v, --verbose  Verbose output
  -h, --help     Show this message and exit.

Commands:
  run       Run a new fastrocs query.
  add       Add a new fastrocs query
  delete    Delete a fastrocs query
  download  Retrieve FastROCS query to local file
  info      Get info about FastROCS query
  list      List FastROCS queries
  results   Retrieve results of a FastROCS query

To do a search there are two main commands: run and add. The first is a synchronous search, that submits the query, monitors the status and downloads the results all in one step. Add is used to asynchronously add several queries at once, then using the results sub-command to download the results later.

In all the cases below, the query is a local molecule file and the dataset ID is the ID returned by frcli dataset list.

$ frcli query run -h
Usage: frcli query run [OPTIONS] <query> <dataset_id> <output>

  Run a new fastrocs query.

  Upload a new query, poll continuously while it is running and when
  complete, download the results into <output>

Options:
  --num-hits N           Number of hits to return  [default: 250]
  --shape-only           Use shape only, no color for scoring.
  --tversky-alpha ALPHA  Alpha coefficient to use for Tversky scoring.
                         [default: 0.95]
  --sim-type SIM         Similarity score.  [default: tanimoto]
  -y, --yes              Say yes to any yes/no questions.  [default: False]
  --json                 Return JSON instead of simple text on stdout.
  -q, --quiet            Minimal output
  -v, --verbose          Verbose output
  -h, --help             Show this message and exit.

Running a query is as easy as:

$ frcli dataset info 155
  id  name            user  status    system    public      num_mols    num_confs  color_force_field    shape_only
----  ------------  ------  --------  --------  --------  ----------  -----------  -------------------  ------------
 155  eMolecules1M       2  LOADED    True      True         1000001      4791117  ImplicitMillsDean    False

$ frcli query run query.sdf 155 hits.sdf
Running[################################] 4791117/4791117 - 00:00:01
  id  name       status       user    dataset_id    num_hits  sim_type      tversky_alpha  shape_only    output_format
----  ---------  ---------  ------  ------------  ----------  ----------  ---------------  ------------  ---------------
 113  4cox-ligD  COMPLETED       2           155         250  Tanimoto                  0  False         sdf

The alternative is to manage the add -> info -> results steps manually

$ frcli query add -h
Usage: frcli query add [OPTIONS] <query> <dataset_id>

  Add a new fastrocs query

Options:
  --output-format FORMAT  Format for output file.  [default: oeb]
  --num-hits N            Number of hits to return  [default: 250]
  --shape-only            Use shape only, no color for scoring.
  --tversky-alpha ALPHA   Alpha coefficient to use for Tversky scoring.
                          [default: 0.95]
  --sim-type SIM          Similarity score.  [default: tanimoto]
  -y, --yes               Say yes to any yes/no questions.  [default: False]
  --json                  Return JSON instead of simple text on stdout.
  -q, --quiet             Minimal output
  -v, --verbose           Verbose output
  -h, --help              Show this message and exit.

$ frcli query results -h
Usage: frcli query results [OPTIONS] <query_id> <output>

  Retrieve results of a FastROCS query

Options:
  -y, --yes      Say yes to any yes/no questions.  [default: False]
  --json         Return JSON instead of simple text on stdout.
  -q, --quiet    Minimal output
  -v, --verbose  Verbose output
  -h, --help     Show this message and exit.

Here is a complete example:

$ frcli query add query.sdf 155 --output-format sdf
  id  name         user    dataset_id    num_hits  sim_type      tversky_alpha  shape_only    output_format
----  ---------  ------  ------------  ----------  ----------  ---------------  ------------  ---------------
 114  4cox-ligD       2           155         250  Tanimoto                  0  False         sdf

$ frcli query info 114
  id  name       status       user    dataset_id    num_hits  sim_type      tversky_alpha  shape_only    output_format
----  ---------  ---------  ------  ------------  ----------  ----------  ---------------  ------------  ---------------
 114  4cox-ligD  COMPLETED       2           155         250  Tanimoto                  0  False         sdf

# once status == COMPLETED, get the results
$ frcli query results 114 hits.sdf

$ molcount.py hits.sdf
hits.sdf contains 250 molecule(s).
===========================================================
Total 250 molecules

All queries and results are stored on the server, but they can be cleaned up.

$ frcli query delete 114
Are you sure you want to delete this query? (y/N) y
Deleted Query 114