Collection to File
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Role-based/Computational Chemist
Solution-based/Virtual-screening/DB Preparation
Task-based/Data Science/Conversion
Description
Concatenates all shards of a collection into a single file. All shards must have the same format. Validation is done to ensure that the format is concatenable. All shards of the input collection(s) must have the same format, and the output file format always matches the collection.
The purpose of this floe is to convert a FastROCS collection into a file that can be used with a FastROCS server (rather than floe).
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Collection (collection_in): An input collection to convert into a file.
Type: collection_source
Input Dataset (input_dataset): A dataset to convert into a file
Type: data_source
Outputs
Output Base Filename (filename): Basename of the output file (without the format extension). The format extension will automatically be added to this name.
Type: string
Default: Collection Converted To File
Temporary Collection (temporary_collection): Name of a temporary collection the floe will create and automatically delete at the end of the floe run. Deleting this collection manually before the floe finishes can cause this floe to fail. There is generally no reason to ever adjust this parameter.
Required
Type: collection_sink
Default: Temporary Collection
Options
Output Format (output_format): The desired format of the output file. Note that depending on the format of the input collection and the format of the output file all data on the input collection is not guaranteed to be retained in the output file.
Required
Type: string
Default: sdf.gz
Choices: [‘can’, ‘can.gz’, ‘csv’, ‘csv.bz2’, ‘csv.gz’, ‘cxsmiles’, ‘cxsmiles.gz’, ‘ism’, ‘ism.gz’, ‘isosmi’, ‘isosmi.gz’, ‘mol2’, ‘mol2.gz’, ‘oeb’, ‘oeb.gz’, ‘oedb’, ‘oez’, ‘sd’, ‘sd.gz’, ‘sdf’, ‘sdf.gz’, ‘smi’, ‘smi.gz’, ‘syb’, ‘syb.gz’, ‘tsv’, ‘tsv.bz2’, ‘tsv.gz’, ‘usm’, ‘usm.gz’]
Clear Titles (clear_titles): If set to On the title on the molecules will be cleared before writing the file
Required
Type: boolean
Default: False
Choices: [True, False]
Development
Verbose (verbose): If ‘On’, this floe will write to the log file each time a shard is processed.
Type: boolean
Default: False
Choices: [True, False]
Re-formated shard size (re_formated_shard_size): The target number of records in a shard.
0 indicates to run up to the max_shard_bytes limit per shard
Required
Type: integer
Default: 1000000
Dataset shard size (dataset_shard_size): The target number of records in a shard.
0 indicates to run up to the max_shard_bytes limit per shard
Required
Type: integer
Default: 100000
Number of Parallel Cubes (number_of_parallel_cubes): Max count for the parallel cube group that converts the format of the input collection
Type: integer
Default: 25