Orion Platform Dataset Cubes

The following cubes provide utilities for interacting with Datasets

Cubes

Dataset Reader

Import Statement

from orionplatform.cubes import DatasetReaderCube

Description

A cube that reads records.

Warning

Set fast_read parameter to False if using in a cube group.

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • Input Dataset: DatasetInputParameter

    The dataset(s) to read records from

  • Fast Read: BooleanParameter

    Directly sends bytes from the database to the port without constructing an OERecord. This improves read performance, but won’t work within cube groups.

  • Fields To Read: StringParameter

    Comma-delimited list of field names or IDs to read from the dataset (Leave blank for all).

  • limit: IntegerParameter

    Maximum number of records to read with this cube

  • Enable timing log: BooleanParameter

    Log timing of the reader to the log

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Dataset Writer

Import Statement

from orionplatform.cubes import DatasetWriterCube

Description

A cube that takes records and writes them out to a dataset

Input Ports

  • intake: RecordInputPort

Ungrouped Parameters

  • Output Dataset: DatasetOutputParameter

    Output dataset to write to

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Options

  • Create If Empty: BooleanParameter

    If true, the output dataset will be created even if no records are sent to this cube

  • Enable timing log: BooleanParameter

    Log timing of the reader to the log

  • Tag: StringParameter

    Tag to apply to the output dataset

Dataset Updater

Import Statement

from orionplatform.cubes import DatasetUpdaterCube

Description

Updates records in the originating Orion datasets.

Input Ports

  • intake: RecordInputPort

Ungrouped Parameters

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Base Dataset Field Add Or Replace Cube.

Import Statement

from orionplatform.cubes import BaseDatasetFieldAddOrReplaceCube

Description

Adds or replaces fields in the originating Orion datasets.

Input Ports

  • intake: RecordInputPort

Ungrouped Parameters

  • Number of records (per field) stored in a batch.: IntegerParameter

    Number of records (per field) accumulated in a batch before they are uploaded to the Orion. Note: If the dataset changes or batch reaches 100 MB, it will be uploaded irrespective of the selected batch_size.

  • Dataset fields to add/update.: StringParameter

    Toggles which dataset fields will be added/updated. Fields are specified by their respective parameters promoted names (or names otherwise) in the ``choices`` attribute of this parameter.

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Dataset Appender

Import Statement

from orionplatform.cubes import DatasetAppenderCube

Description

Appends records to datasets.

Input Ports

  • intake: RecordInputPort

Ungrouped Parameters

  • Datasets to append to: DatasetInputParameter

    The dataset(s) to read records from

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Dataset Batcher

Import Statement

from orionplatform.cubes import DatasetBatcherCube

Description

A cube that splits a dataset into batches of offsets and limits. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch. The batches are read (possibly in parallel) by one or more downstream cubes.

Output Ports

  • success: JsonOutputPort

Ungrouped Parameters

  • Number of records in a batch: IntegerParameter

    Maximum number of records to read with this cube

  • Data to read from: DatasetInputParameter

    The data to read from

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Dataset Batch Reader

Import Statement

from orionplatform.cubes import DatasetBatchReaderCube

Description

A cube that reads a batch of records from a dataset, where a batch is defined by id, offset, and limit. The offset indicates how many records should be skipped, and the limit (i.e. batch_size) indicates how many records should be read in the batch.

Input Ports

  • intake: JsonInputPort

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Parallel Version

ParallelDatasetBatchReaderCube

Boolean Switch

Import Statement

from orionplatform.cubes import BooleanSwitch

Description

This cube sends records to the true or false port depending on the value of the switch parameter.

Useful for developing Floes in the Orion UI, but is not suggested in the case where performance is crucial.

Input Ports

  • intake: RecordBytesInputPort

Output Ports

  • false: RecordBytesOutputPort

  • true: RecordBytesOutputPort

Ungrouped Parameters

  • Switch: BooleanParameter

    This parameter controls whether records are sent to the ‘true’ or ‘false’ port

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Parallel Version

ParallelBooleanSwitch