Dataset Subsetting – Random Splitting¶

This Floe randomly splits one input dataset into two output datasets with a specified percentage of records is sent to the output dataset, called selected. The rest of the records are emitted to unselected dataset (upon request).

Records are chosen stochastically, so the specified splitting percentage may not be precisely achieved. By default, the cube randomly splits a set of records into two sets of approximately equal size. This Floe is suitable for splitting larger datasets.

Extra Required Parameters

Output selected dataset (dataset_out) : Output dataset to write to

Default: selected

Write unselected dataset (boolean) : If off, then the ‘unselected’ dataset is not generated.

Default: False

Input Dataset (data_source) : Dataset to split randomly

Percentage (decimal) : The percentages of records randomly selected and be emitted to dataset called ‘selected’.

Default: 50 Min: 1 Max: 99