Orion Platform Dataset Cubes
The following cubes provide utilities for interacting with Datasets
Cubes
Dataset Reader
Import Statement
from orionplatform.cubes import DatasetReaderCube
Description
A cube that reads records.
Warning
Set fast_read parameter to False if using in a cube group.
Output Ports
success: RecordOutputPort
Ungrouped Parameters
Input Dataset: DatasetInputParameter
The dataset(s) to read records from
Fast Read: BooleanParameter
Directly sends bytes from the database to the port without constructing an OERecord. This improves read performance, but won’t work within cube groups.
Fields To Read: StringParameter
Comma-delimited list of field names or IDs to read from the dataset (Leave blank for all).
limit: IntegerParameter
Maximum number of records to read with this cube
Enable timing log: BooleanParameter
Log timing of the reader to the log
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Writer
Import Statement
from orionplatform.cubes import DatasetWriterCube
Description
A cube that takes records and writes them out to a dataset
Input Ports
intake: RecordInputPort
Ungrouped Parameters
Output Dataset: DatasetOutputParameter
Output dataset to write to
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Options
Create If Empty: BooleanParameter
If true, the output dataset will be created even if no records are sent to this cube
Enable timing log: BooleanParameter
Log timing of the reader to the log
Tag: StringParameter
Tag to apply to the output dataset
Dataset Updater
Import Statement
from orionplatform.cubes import DatasetUpdaterCube
Description
Updates records in the originating Orion datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Base Dataset Field Add Or Replace Cube.
Import Statement
from orionplatform.cubes import BaseDatasetFieldAddOrReplaceCube
Description
Adds or replaces fields in the originating Orion datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters
Number of records (per field) stored in a batch.: IntegerParameter
Number of records (per field) accumulated in a batch before they are uploaded to the Orion. Note: If the dataset changes or batch reaches 100 MB, it will be uploaded irrespective of the selected batch_size.
Dataset fields to add/update.: StringParameter
Toggles which dataset fields will be added/updated. Fields are specified by their respective parameters promoted names (or names otherwise) in the ``choices`` attribute of this parameter.
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Appender
Import Statement
from orionplatform.cubes import DatasetAppenderCube
Description
Appends records to datasets.
Input Ports
intake: RecordInputPort
Ungrouped Parameters
Datasets to append to: DatasetInputParameter
The dataset(s) to read records from
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Batcher
Import Statement
from orionplatform.cubes import DatasetBatcherCube
Description
A cube that splits a dataset into batches of offsets and limits. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch. The batches are read (possibly in parallel) by one or more downstream cubes.
Output Ports
success: JsonOutputPort
Ungrouped Parameters
Number of records in a batch: IntegerParameter
Maximum number of records to read with this cube
Data to read from: DatasetInputParameter
The data to read from
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Dataset Batch Reader
Import Statement
from orionplatform.cubes import DatasetBatchReaderCube
Description
A cube that reads a batch of records from a dataset, where a batch is defined by id, offset, and limit. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch.
Input Ports
intake: JsonInputPort
Output Ports
success: RecordOutputPort
Ungrouped Parameters
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
ParallelDatasetBatchReaderCube
Boolean Switch
Import Statement
from orionplatform.cubes import BooleanSwitch
Description
This cube sends records to the true or false port depending on the value of the switch parameter.
Useful for developing Floes in the Orion UI, but is not suggested in the case where performance is crucial.
Input Ports
intake: RecordBytesInputPort
Output Ports
false: RecordBytesOutputPort
true: RecordBytesOutputPort
Ungrouped Parameters
Switch: BooleanParameter
This parameter controls whether records are sent to the ‘true’ or ‘false’ port
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
ParallelBooleanSwitch