Dataset Subsetting Based on String Keys

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Task-based/Data Science/Filtering

Description

Subset a dataset using string keys. This floe takes a dataset and two input parameters: a string field from that dataset and a string parameter as input. It splits the string field by line to create keys, and then emits records from the input dataset that have values of the specified string field which match any of these keys.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Dataset (in): Dataset to subset.

Required

Type: data_source

Reference Input Dataset (ref_in): Reference dataset for file-based subset.

Type: data_source

Reference Identifier Field (String Reference Field): The name for the string data field in the reference records read from the ‘init’ port.

Type: field_parameter::string

Default: String Reference Field

Input String Field To Use For Subsetting (Field to Subset): The name for the string data field.

Required

Type: field_parameter::string

Default: Subset Field

Use String For Input (Use String For Input): If true, use input string as input. If false (default), use input dataset.

Type: boolean

Default: False

Choices: [True, False]

Number of messages to distribute at a time (item_count): Number of records to process in each instance of subset cube.

Required

Type: integer

Default: 5000

CPUs (cpu_count): Number of CPUs to use in each instance of subset cube.

Required

Type: integer

Default: 4

Inputs If Using String For Reference Input

Input String (Input String): String to convert into records separated by line breaks.

Type: string

Separator (Input String Separator): The string used to separate the input string into records.

Type: string

Default: ,

Outputs

Output matched dataset. (matched): Name of output matched dataset

Required

Type: dataset_out

Default: matched

Output unmatched dataset (unmatched):

Type: dataset_out

Default: unmatched

Write unmatched dataset (switch_unmatched): If off, then the ‘unmatched’ dataset is not generated.

Required

Type: boolean

Default: False

Choices: [True, False]