Prepare Vendor Database for Giga Docking and FastROCS

In this tutorial compounds from the Mcule Purchasable (In Stock) database will be prepared for Giga Docking and FastROCS. Running the floes in this tutorial will generate approximately $45 in Orion compute charges. (the cost will vary somewhat depending on current AWS pricing).

This tutorial uses the following floes

Note

OpenEye already prepares several vendor collection for use in Orion (as of this writing Enamine Real, Wuxi Galaxy and Mcule Ultimate). This tutorial is for those who wish to prepare their own custom in-house databases or other vendor database for Giga-Docking and fastROCS.

Create a Tutorial Project and Working Directory

Note

If you have already created a Tutorial project while doing another tutorial you can re-use the existing one and skip this step.

  1. Log into Orion

  2. Click the Home button at the top of the left menubar.

  3. Click on the ‘Create New Project’ button and in the pop up dialog enter Tutorial for the name of the project and click ‘Save’.

Import Mcule Database into Orion

Download the Mcule Purchasable (In Stock) database in .smi.gz format from the Mcule database website to your local machine. Then upload the file from your local machine to Orion as follows

  1. Click on the ‘Data’ button in the left menu bar

  2. Select ‘My Data’ from the list of options to the left of the page.

  3. Create an ‘Input Data’ folder if one does not already exist by clicking the blue folder icon with the + symbol.

  4. Open the ‘Input Data’ folder by clicking on it.

  5. Set the ‘Type’ drop down menu to Files if it is not already.

  6. Open the ‘Add Data’ menu and click ‘Upload’

  7. In the modal that appear select the Mcule Purchasable (In Stock) database file on your local machine.

  8. Select ‘Show advanced options’

  9. Select ‘None’ for the processing method of the file.

    Warning

    If ‘None’ is not selected here the uploading of the file may fail.

  10. Click ‘Upload’

Important

As of the writing of this document the Mcule Purchasable (In Stock) database is ~10 million molecules, and preparing these molecules will cost approximately $50 in Orion compute charges. If at the time you download this database it is larger than ~10 million the Orion compute charges will be higher (the cost is roughly linear with the number of molecules).

Create GigaDocking and FastROCS collections

Locate the Prepare Giga Collections floe as follows

  1. Click on the ‘Floes’ button in the left menu bar

  2. Click on the ‘Floes’ tab in the upper left of the main window.

  3. Click ‘All Floes’ in the left pane

  4. In the search bar at the top of the right pane enter Prepare Giga Collections

The Prepare Giga Collections will be visible to the right. Click it and enter the following parameter settings in the floe launch UI that pops up.

  • Jobs Properties

    • Output Folder : Tutorial/My Data/Mcule In Stock Collections

  • Promoted Parameters

    • Inputs

      • Input File(s) (Semi-Optional) : Tutorial/My Data/Input Data/mcule_purchasable_in_stock_210615.smi.gz

        To select this file you downloaded from the Mcule site and then upload to orion.

        Note

        The file you downloaded from Mcule may have a slightly different name than mcule_purchasable_in_stock_210615.smi.gz.

        1. Click the ‘Choose Input Button’

        2. Click on ‘Project Data’ under collections in the Select Input File(s) modal.

        3. If you do not see the mcule_purchasable_in_stock_201103.smi.gz then in the search bar type ‘mcule’ and click ‘search for resource mcule’ in the drop-down menu.

        4. Select the mcule_purchasable_in_stock_201103.smi.gz file.

        5. Click ‘Use File as Input’

    • Outputs

      • Giga Docking Collection Name : Mcule In Stock GigaDock Collection

      • FastROCS Collection Name : Mcule In Stock FastROCS Collection

Now launch the floe using the ‘Start Job’ button at the bottom of the floe launch UI. This job may take up to 10h to complete and will have an Orion Compute Charge cost of approximately $40.

The output of this floe will be the following two collections:

  1. Mcule In Stock GigaDock Collection : Molecule ready for use with the Giga Docking floe.

  2. Mcule In Stock Purchasable FastROCS Collection : Molecule ready for use with the

Note

If you are using this tutorial as a guide to prepare your own set of molecules and the format of the input files is CSV see Prepare Molecules in a CSV File for GigaDocking and FastROCS in addition to this tutorial.

View Collection Properties

Collections cannot be introspected by the Orion UI like dataset can. To view basic information (e.g., Molecular Properties) of the prepared collections first located the Collection Info floe

  1. Click on the ‘Floes’ button in the left menu bar

  2. Click on the ‘Floes’ tab in the upper left of the main window.

  3. Click ‘All Floes’ in the left pane

  4. In the search bar at the top of the right pane enter Collection Info

The Collection Info will be visible to the right. Click it and enter the following parameter settings in the floe launch UI that pops up.

  • Jobs Properties

    • Output Folder : Tutorial/My Data/Mcule In Stock Collections

  • Promoted Parameters

    • Inputs

    • Input Collection : Tutorial/My Data/Mcule Purchasable Collections/Mcule Purchasable In Stock GigaDock Collection

      Select the collection generated in the previous step of the tutorial as follows.

      1. Click the ‘Choose Input’ button for Input Collection to open the Select Collection modal.

      2. Select the ‘My Data’ folder.

      3. Select the ‘Mcule Purchasable Collections’ folder.

      4. Select the Mcule Purchasable In Stock GigaDock Collection collection.

      5. Click ‘Use dataset as Input’.

      Note

      You could also select the ‘Mcule In Stock FastROCS Collection’ instead of the GigaDock collection. It will have nearly identical molecular properties to the Gigadock collection, but will have a maximum of 10 conformers per molecule rather than 200.

The floe will take about 15min to run and with an Orion compute charge of about $1. Once the floe completes go to the job page and click on the ‘Floe Report’ tab above the window with the floe diagram. A set of histograms showing the basic molecular properties of the collection will then be visible.

collection_info_report

Image of the floe report on the Mcule collection.

View of Random Sample of the Collection

The previous step in the tutorial examining the distribution of basic molecular properties of the prepared molecules. In this step a random sample of the collection will be converted into a dataset and examined in the 3D page.

Note

Collection are not viewable in the 3D or Analyse page directly.

Located the Sample Collection as follows

  1. Click on the ‘Floes’ button in the left menu bar

  2. Click on the ‘Floes’ tab in the upper left of the main window.

  3. Set the ‘Browse Workfloes’ drop down menu to ‘Show all packages’

  4. Select ‘All’ under Browse Workfloes

  5. In the search bar enter Sample Collection

The Sample Collection will be visible to the right. Click it and enter the following parameter settings in the floe launch UI that pops up.

  • Jobs Properties

    • Output Folder : Tutorial/My Data/Mcule In Stock Collections

  • Promoted Parameters

    • Inputs

      • Input Collection : Tutorial/My Data/Mcule Purchasable Collections/Mcule Purchasable In Stock FastROCS Collection

        Select the collection generated in the previous step of the tutorial as follows.

        1. Click the ‘Choose Input’ button for Input Collection to open the Select Collection modal.

        2. Select the ‘My Data’ folder.

        3. Select the ‘Mcule Purchasable Collections’ folder.

        4. Select the Mcule Purchasable In Stock FastROCS Collection collection.

        5. Click ‘Use dataset as Input’.

        Note

        You could also select the ‘Mcule In Stock GigaDock Collection’ instead of the FastROCS collection. It will have nearly identical molecular properties to the FastROCS collection, but will have a maximum of 200 conformers per molecule rather than 10.

The floe will take about 15min to run and with an Orion compute charge of about $1. Once the job finishes view the sample in the 3D view as follows

  1. Navigate to the floe job page

  2. Under Results in the top left click the ‘Show in Project Data’ link next to ‘Sample Dataset’

  3. Clear any currently active datasets by opening the ‘Active Datasets’ drop down un the upper left and clicking ‘Clear All’

  4. Make the Sample Dataset active by clicking on the + symbol to the left of the name to make it green.

  5. Switch to the 3D view by clicking the ‘3D’ button on the blue menu to the left of the interface.

  6. Open the layout menu in the upper right and select ‘3D viewer with spreadsheet’

You can now browse individual records using the mouse or up down arrows.