Dataset Subsetting – Random Selection

This Floe randomly selects N records from the input dataset and sends them to the selected output dataset. The rest of the records are emitted to unselected dataset (upon request).

This Floe has to cache records, therefore, it is not suitable for splitting large datasets.

Extra Required Parameters

  • Number of selected (integer) : The number of randomly selected records that will be emitted to the dataset called ‘selected’
    Default: 100
  • Write unselected dataset (boolean) : If off, then the ‘unselected’ dataset is not generated.
    Default: False
  • Input Dataset (data_source) : Dataset to randomly select from
  • Output selected dataset (dataset_out) : Output dataset to write to
    Default: selected