Orion Platform Dataset Cubes¶
The following cubes provide utilities for interacting with Datasets
Cubes¶
Dataset Reader¶
Import
from orionplatform.cubes import DatasetReaderCube
Description
A cube that reads records.
Warning
Set fast_read parameter to False if using in a cube group.
Output Ports
success: RecordOutputPort
Ungrouped Parameters¶
data_in: DatasetInputParameter
The dataset(s) to read records from
fast_read: BooleanParameter
Directly sends bytes from the database to the port without constructing an OERecord. This improves read performance, but won’t work within cube groups.
limit: IntegerParameter
Maximum number of records to read with this cube
log_timer: BooleanParameter
Log timing of the reader to the log
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Dataset Writer¶
Import
from orionplatform.cubes import DatasetWriterCube
Description
A cube that takes records and writes them out to a dataset
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
data_out: DatasetOutputParameter
Output dataset to write to
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Options
create_if_empty: BooleanParameter
If true, the output dataset will be created even if no records are sent to this cube
log_timer: BooleanParameter
Log timing of the reader to the log
output_tags: StringParameter
Tag to apply to the output dataset
Dataset Updater¶
Import
from orionplatform.cubes import DatasetUpdaterCube
Description
Updates records in the originating Orion datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Base Dataset Field Add Or Replace Cube.¶
Import
from orionplatform.cubes import BaseDatasetFieldAddOrReplaceCube
Description
Adds or replaces fields in the originating Orion datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
batch_size: IntegerParameter
Number of records (per field) accumulated in a batch before they are uploaded to the Orion. Note: If the dataset changes or batch reaches 100 MB, it will be uploaded irrespective of the selected batch_size.
fields_to_update: StringParameter
Toggles which dataset fields will be added/updated. Fields are specified by their respective parameters promoted names (or names otherwise) in the ``choices`` attribute of this parameter.
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Dataset Appender¶
Import
from orionplatform.cubes import DatasetAppenderCube
Description
Appends records to datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
data_in: DatasetInputParameter
The dataset(s) to read records from
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Dataset Batcher¶
Import
from orionplatform.cubes import DatasetBatcherCube
Description
A cube that splits a dataset into batches of offsets and limits. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch. The batches are read (possibly in parallel) by one or more downstream cubes.
Output Ports
success: JsonOutputPort
Ungrouped Parameters¶
batch_size: IntegerParameter
Maximum number of records to read with this cube
data_in: DatasetInputParameter
The data to read from
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Dataset Batch Reader¶
Import
from orionplatform.cubes import DatasetBatchReaderCube
Description
A cube that reads a batch of records from a dataset, where a batch is defined by id, offset, and limit. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch.
Input Ports
intake: JsonInputPort
Output Ports
success: RecordOutputPort
Ungrouped Parameters¶
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version¶
ParallelDatasetBatchReaderCube
Boolean Switch¶
Import
from orionplatform.cubes import BooleanSwitch
Description
This cube sends records to the true or false port depending on the value of the switch parameter.
Useful for developing Floes in the Orion UI, but is not suggested in the case where performance is crucial.
Input Ports
intake: RecordBytesInputPort
Output Ports
false: RecordBytesOutputPort
true: RecordBytesOutputPort
Ungrouped Parameters¶
switch: BooleanParameter
This parameter controls whether records are sent to the ‘true’ or ‘false’ port
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
cpu_count: IntegerParameter
The number of CPUs to run this cube with
disk_space: DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
gpu_count: IntegerParameter
The number of GPUs to run this cube with
instance_tags: StringParameter
Only run on machines with matching tags (comma separated)
instance_type: StringParameter
The type of instance that this cube needs to be run on
memory_mb: DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
spot_policy: StringParameter
Control cube placement on spot market instances
Metrics
cube_metrics: StringParameter
Set of metrics to be collected
metric_period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version¶
ParallelBooleanSwitch