Prepare Collection for Fast Similarity or Substructure Search from File

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Role-based/Medicinal Chemist

  • Task-based/Library Prep & Design/Substructure & Similarity Search

  • Solution-based/Virtual-screening/DB Search/2D Similarity and SubSearch

Description

From input file, prepares collection for fast similarity or substructure search. This floe screens molecules for filtering characteristics so that they can be subsequently searched more quickly in the Fast Substructure Search floe, and generates a collection with multiple types of fingerprints, so that they can be searched in the Fast Similarity Search floe.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Molecule File (in): Input file, containing molecules. Must be an OpenEye supported format.

  • Required

  • Type: file_in

Advanced Parallelism Settings

Maximum Parallel Cube Instances (max_parallel_central): Maximum number of cubes at any one time for the parallel part of search prep computation.

  • Type: integer

  • Default: 1000

Outputs

Make Fast Similarity Search Input Collection (make_sim_coll): Determines whether to make fast similarity search prepared collection.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Fast Substructure Search Input Collection Choice (ss_coll_choice): Determines query type of fast substructure search prepared collection:

Both(default), MDL, SMARTS, or None. This floe will fail if None is chosen for this parameter and Make Fast Similarity Search Input Collection is turned off.

  • Required

  • Type: string

  • Default: Both

  • Choices: [‘Both’, ‘MDL’, ‘SMARTS’, ‘None’]

Substructure Search Input Collection Name (coll_name): Name of fast substructure search prepared collection.

  • Required

  • Type: collection_sink

  • Default: Fast Substructure Search Input Collection

Similarity Search Input Collection Name (sim_coll_name): Name of fast similarity search prepared collection.

  • Required

  • Type: collection_sink

  • Default: Fast Similarity Search Input Collection

Advanced: Large Inputs

Number of Molecules per Shard (mols_per_shard): Number of molecules per shard. Since each shard is searched in parallel this controls the granularity of the subsequent fast similarity or substructure search floe run.

  • Type: integer

  • Default: 250000

File Reader Disk Space (MiB) (disk_space): This size (in MiB) must be larger than the total size of the file(s) provided as input.

  • Type: decimal

  • Default: 5120