Collection to File

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Role-based/Computational Chemist

Solution-based/Virtual-screening/DB Preparation

Task-based/Data Science/Conversion

Description

Concatenates all shards of a collection into a single file. All shards must have the same format. Validation is done to ensure that the format is concatenable. All shards of the input collection(s) must have the same format, and the output file format always matches the collection.

The purpose of this floe is to convert a FastROCS collection into a file that can be used with a FastROCS server (rather than floe).

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Collection (collection_in): An input collection to convert into a file.

Type: collection_source

Input Dataset (input_dataset): A dataset to convert into a file

Type: data_source

Outputs

Output Base Filename (filename): Basename of the output file (without the format extension). The format extension will automatically be added to this name.

Type: string

Default: Collection Converted To File

Temporary Collection (temporary_collection): Name of a temporary collection the floe will create and automatically delete at the end of the floe run. Deleting this collection manually before the floe finishes can cause this floe to fail. There is generally no reason to ever adjust this parameter.

Required

Type: collection_sink

Default: Temporary Collection

Options

Output Format (output_format): The desired format of the output file. Note that depending on the format of the input collection and the format of the output file all data on the input collection is not guaranteed to be retained in the output file.

Required

Type: string

Default: sdf.gz

Choices: [‘can’, ‘can.gz’, ‘csv’, ‘csv.bz2’, ‘csv.gz’, ‘cxsmiles’, ‘cxsmiles.gz’, ‘ism’, ‘ism.gz’, ‘isosmi’, ‘isosmi.gz’, ‘mol2’, ‘mol2.gz’, ‘oeb’, ‘oeb.gz’, ‘oedb’, ‘oez’, ‘sd’, ‘sd.gz’, ‘sdf’, ‘sdf.gz’, ‘smi’, ‘smi.gz’, ‘syb’, ‘syb.gz’, ‘tsv’, ‘tsv.bz2’, ‘tsv.gz’, ‘usm’, ‘usm.gz’]

Clear Titles (clear_titles): If set to On the title on the molecules will be cleared before writing the file

Required

Type: boolean

Default: False

Choices: [True, False]

Development

Verbose (verbose): If ‘On’, this floe will write to the log file each time a shard is processed.

Type: boolean

Default: False

Choices: [True, False]

Re-formated shard size (re_formated_shard_size): The target number of records in a shard.

0 indicates to run up to the max_shard_bytes limit per shard

Required

Type: integer

Default: 1000000

Dataset shard size (dataset_shard_size): The target number of records in a shard.

0 indicates to run up to the max_shard_bytes limit per shard

Required

Type: integer

Default: 100000

Number of Parallel Cubes (number_of_parallel_cubes): Max count for the parallel cube group that converts the format of the input collection

Type: integer

Default: 25