Dataset Subsetting Based on String Keys

This flow takes takes a dataset and two input parameters: a string field from that dataset,

and a string paraemter as input. It splits the string field by line to create keys, and then emits records from the

input dataset that have values of the specified string field which match any of these keys.

Extra Required Parameters

  • CPUs (integer) : Number of CPUs to use in each instance of subset cube
    Default: 4 Min: 1 Max: 128
  • Input String Field To Use For Subsetting (Field Type: String) : The name for the string data field.
    Default: Subset Field
  • Number of messages to distribute at a time (integer) : Number of records to process in each instance of subset cube
    Default: 5000 Min: 1 Max: 65535
  • Output matched dataset (dataset_out) : Output dataset to write to
    Default: matched
  • Write unmatched dataset (boolean) : If off, then the ‘unmatched’ dataset is not generated.
    Default: False
  • Input Dataset (data_source) : Dataset to subset