Dataset Subsetting Based on String Keys¶

This flow takes takes a dataset and two input parameters: a string field from that dataset,: and a string paraemter as input. It splits the string field by line to create keys, and then emits records from the

input dataset that have values of the specified string field which match any of these keys.

Extra Required Parameters

CPUs (integer) : Number of CPUs to use in each instance of subset cube

Default: 4 Min: 1 Max: 128

Input String Field To Use For Subsetting (Field Type: String) : The name for the string data field.

Default: Subset Field

Number of messages to distribute at a time (integer) : Number of records to process in each instance of subset cube

Default: 5000 Min: 1 Max: 65535

Output matched dataset (dataset_out) : Output dataset to write to

Default: matched

Write unmatched dataset (boolean) : If off, then the ‘unmatched’ dataset is not generated.

Default: False

Input Dataset (data_source) : Dataset to subset