Dataset Subsetting – Random Splitting

This Floe randomly splits one input dataset into two output datasets with a specified percentage of records is sent to the output dataset, called selected. The rest of the records are emitted to unselected dataset (upon request).

Records are chosen stochastically, so the specified splitting percentage may not be precisely achieved. By default, the cube randomly splits a set of records into two sets of approximately equal size. This Floe is suitable for splitting larger datasets.

Extra Required Parameters

  • Output selected dataset (dataset_out) : Output dataset to write to
    Default: selected
  • Write unselected dataset (boolean) : If off, then the ‘unselected’ dataset is not generated.
    Default: False
  • Input Dataset (data_source) : Dataset to split randomly
  • Percentage (decimal) : The percentages of records randomly selected and be emitted to dataset called ‘selected’.
    Default: 50 Min: 1 Max: 99