Collection to Hitlist Dataset

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/FastROCS

  • Product-based/Gigadock

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/Analysis

  • Task-based/Data Science/Conversion

Description

Creates a dataset of the top scoring compounds from a collection ranked by value of a float field in the collection.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Collection (input_collection): Collection to extract top scoring records from.

  • Required

  • Type: collection_source

Sort Field (sort_field): Field in the collection to sort on. If you are processing the Raw Results collection from a giga docking run the sort field is ‘Chemgauss4’

  • Required

  • Type: field_parameter::float

Outputs

Output Dataset (output_dataset): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: Collection Hit List

Options

Descending (descending): If ‘On’ scores will be sorted in descending order (i.e, high scores will appear at the top of the hit list). If ‘Off’ scores will be sorted in ascending order (i.e., low scores will appear at the top of the hit list).Hint: Set this to ‘On’ when processing ROCS/FastROCS results and ‘Off’ when processing docking results.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Sort Hit List (sort_switch): If turned off the output dataset will still contain the top N molecules from the input collection, however within the dataset the molecules will not be sorted. This will reduce the memory needed for the hit list cube.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Hit List Size (hit_list_size): Size of the output hit list. If this value is set to greater than 100K and ‘Sort Hit List’ is true the amount of memory for the serial cubes may need to be increased. (see ‘Serial Cube Memory’ parameter)

  • Required

  • Type: integer

  • Default: 10000

Serial Cube Memory (serial_memory_mb): Memory (in MB) allocated to both the ‘Hit List’ and ‘Find Score Cutoff’ cubes.

  • Type: decimal

  • Default: 30720

Development

Min Shard Download Timeout (min_shard_download_timeout): Minimum timeout for the smart shard to records cubes

  • Required

  • Type: integer

  • Default: 2

Max Shard Download Timeout (max_shard_download_timeout): Maximum timeout for the smart shard to records cubes

  • Required

  • Type: integer

  • Default: 21600.0

Session Retry Dict for Shard Download (session_retry_dict_for_shard_download_): Session Retry Dict for the smart shard to records cubes

  • Type: string

  • Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]

Shard Download Attempts (shard_download_attempts): Number of attempts to make when downloading a shard

  • Type: integer

  • Default: 1