Dataset Subsetting – Random Splitting¶
This Floe randomly splits one input dataset into two output datasets with a specified percentage of records is sent to the output dataset, called selected. The rest of the records are emitted to unselected dataset (upon request).
Records are chosen stochastically, so the specified splitting percentage may not be precisely achieved. By default, the cube randomly splits a set of records into two sets of approximately equal size. This Floe is suitable for splitting larger datasets.
Extra Required Parameters
Output selected dataset (dataset_out) : Output dataset to write toDefault: selected Write unselected dataset (boolean) : If off, then the ‘unselected’ dataset is not generated.Default: False Input Dataset (data_source) : Dataset to split randomly Percentage (decimal) : The percentages of records randomly selected and be emitted to dataset called ‘selected’.Default: 50 Min: 1 Max: 99