Prepare Collection for Fast Similarity or Substructure Search from File

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Role-based/Medicinal Chemist

  • Task-based/Library Prep & Design/Substructure & Similarity Search

  • Solution-based/Virtual-screening/DB Search/2D Similarity and SubSearch


From input file, prepares collection for fast similarity or substructure search. This floe screens molecules for filtering characteristics so that they can be subsequently searched more quickly in the Fast Substructure Search floe, and generates a collection with multiple types of fingerprints, so that they can be searched in the Fast Similarity Search floe.

Promoted Parameters

Title in user interface (promoted name)


Input Molecule File (in): Input file, containing molecules. Must be an OpenEye supported format.

  • Required

  • Type: file_in

Advanced Parallelism Settings

Maximum Parallel Cube Instances (max_parallel_central): Maximum number of cubes at any one time for the parallel part of search prep computation.

  • Type: integer

  • Default: 1000

Number of messages to distribute at a time (item_count_central): Maximum units of work sent to each parallel cube

  • Type: integer

  • Default: 1


Substructure Search Input Collection Name (coll_name): Name of fast substructure search prepared collection.

  • Required

  • Type: collection_sink

  • Default: Fast Substructure Search Input Collection

Similarity Search Input Collection Name (sim_coll_name): Name of fast similarity search prepared collection.

  • Required

  • Type: collection_sink

  • Default: Fast Similarity Search Input Collection

Advanced: Large Inputs

Number of Molecules per Shard (mols_per_shard): Number of molecules per shard. Since each shard is searched in parallel this controls the granularity of the subsequent fast similarity or substructure search floe run.

  • Type: integer

  • Default: 250000

File Reader Disk Space (MiB) (disk_space): This size (in MiB) must be larger than the total size of the file(s) provided as input.

  • Type: decimal

  • Default: 5120