Dataset Subsetting Based on String Keys
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Task-based/Data Science/Filtering
Description
Subset a dataset using string keys. This floe takes a dataset and two input parameters: a string field from that dataset and a string parameter as input. It splits the string field by line to create keys, and then emits records from the input dataset that have values of the specified string field which match any of these keys.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Dataset (in): Dataset to subset.
Required
Type: data_source
Reference Input Dataset (ref_in): Reference dataset for file-based subset.
Type: data_source
Reference Identifier Field (String Reference Field): The name for the string data field in the reference records read from the ‘init’ port.
Type: field_parameter::string
Default: String Reference Field
Input String Field To Use For Subsetting (Field to Subset): The name for the string data field.
Required
Type: field_parameter::string
Default: Subset Field
Use String For Input (Use String For Input): If true, use input string as input. If false (default), use input dataset.
Type: boolean
Default: False
Choices: [True, False]
Number of messages to distribute at a time (item_count): Number of records to process in each instance of subset cube.
Required
Type: integer
Default: 5000
CPUs (cpu_count): Number of CPUs to use in each instance of subset cube.
Required
Type: integer
Default: 4
Inputs If Using String For Reference Input
Input String (Input String): String to convert into records separated by line breaks.
Type: string
Separator (Input String Separator): The string used to separate the input string into records.
Type: string
Default: ,
Outputs
Output matched dataset. (matched): Name of output matched dataset
Required
Type: dataset_out
Default: matched
Output unmatched dataset (unmatched):
Type: dataset_out
Default: unmatched
Write unmatched dataset (switch_unmatched): If off, then the ‘unmatched’ dataset is not generated.
Required
Type: boolean
Default: False
Choices: [True, False]