Orion Platform Dataset Cubes¶
The following cubes provide utilities for interacting with Datasets
Cubes¶
Dataset Reader¶
Import Statement
from orionplatform.cubes import DatasetReaderCube
Description
A cube that reads records.
Warning
Set fast_read parameter to False if using in a cube group.
Output Ports
success: RecordOutputPort
Ungrouped Parameters¶
Input Dataset: DatasetInputParameter
The dataset(s) to read records from
Fast Read: BooleanParameter
Directly sends bytes from the database to the port without constructing an OERecord. This improves read performance, but won’t work within cube groups.
limit: IntegerParameter
Maximum number of records to read with this cube
Enable timing log: BooleanParameter
Log timing of the reader to the log
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Writer¶
Import Statement
from orionplatform.cubes import DatasetWriterCube
Description
A cube that takes records and writes them out to a dataset
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
Output Dataset: DatasetOutputParameter
Output dataset to write to
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Options
Create If Empty: BooleanParameter
If true, the output dataset will be created even if no records are sent to this cube
Enable timing log: BooleanParameter
Log timing of the reader to the log
Tag: StringParameter
Tag to apply to the output dataset
Dataset Updater¶
Import Statement
from orionplatform.cubes import DatasetUpdaterCube
Description
Updates records in the originating Orion datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Base Dataset Field Add Or Replace Cube.¶
Import Statement
from orionplatform.cubes import BaseDatasetFieldAddOrReplaceCube
Description
Adds or replaces fields in the originating Orion datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
Number of records (per field) stored in a batch.: IntegerParameter
Number of records (per field) accumulated in a batch before they are uploaded to the Orion. Note: If the dataset changes or batch reaches 100 MB, it will be uploaded irrespective of the selected batch_size.
Dataset fields to add/update.: StringParameter
Toggles which dataset fields will be added/updated. Fields are specified by their respective parameters promoted names (or names otherwise) in the ``choices`` attribute of this parameter.
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Appender¶
Import Statement
from orionplatform.cubes import DatasetAppenderCube
Description
Appends records to datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters¶
Datasets to append to: DatasetInputParameter
The dataset(s) to read records from
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Batcher¶
Import Statement
from orionplatform.cubes import DatasetBatcherCube
Description
A cube that splits a dataset into batches of offsets and limits. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch. The batches are read (possibly in parallel) by one or more downstream cubes.
Output Ports
success: JsonOutputPort
Ungrouped Parameters¶
Number of records in a batch: IntegerParameter
Maximum number of records to read with this cube
Data to read from: DatasetInputParameter
The data to read from
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Batch Reader¶
Import Statement
from orionplatform.cubes import DatasetBatchReaderCube
Description
A cube that reads a batch of records from a dataset, where a batch is defined by id, offset, and limit. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch.
Input Ports
intake: JsonInputPort
Output Ports
success: RecordOutputPort
Ungrouped Parameters¶
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version¶
ParallelDatasetBatchReaderCube
Boolean Switch¶
Import Statement
from orionplatform.cubes import BooleanSwitch
Description
This cube sends records to the true or false port depending on the value of the switch parameter.
Useful for developing Floes in the Orion UI, but is not suggested in the case where performance is crucial.
Input Ports
intake: RecordBytesInputPort
Output Ports
false: RecordBytesOutputPort
true: RecordBytesOutputPort
Ungrouped Parameters¶
Switch: BooleanParameter
This parameter controls whether records are sent to the ‘true’ or ‘false’ port
Parameter Groups¶
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version¶
ParallelBooleanSwitch