Orion Platform Collection Cubes
Collection Reader
Import Statement
from orionplatform.cubes import CollectionReaderCube
Emits collections
Output Ports
success: CollectionOutputPort
Ungrouped Parameters
Collection: CollectionInputParameter
Collections to use as input to a floe
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Shard Reader
Import Statement
from orionplatform.cubes import ShardReaderCube
Reads collections and emits the shards from each collection
Output Ports
success: ShardOutputPort
Ungrouped Parameters
Input Collections: CollectionInputParameter
Collections to emit shards from
limit: IntegerParameter
Maximum number of shards to read with this cube
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Collection to Records
Import Statement
from orionplatform.cubes import CollectionToRecordsCube
Decodes a collection into OERecords.
Input Ports
intake: ShardInputPort
intake_list: JsonInputPort
Output Ports
failure: ShardOutputPort
success: RecordOutputPort
Ungrouped Parameters
Shard Format: StringParameter
The format of the data that shards contain
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
Create Collection
Import Statement
from orionplatform.cubes import CreateCollectionCube
Creates a collection to which shards can be added.
Output Ports
success: CollectionOutputPort
Ungrouped Parameters
Collection Metadata Copy: CollectionInputParameter
Collection to copy the meta data from
Collection Name: CollectionOutputParameter
Name of the collection to create
Collection Tags: StringParameter
Tags to apply to the output dataset, comma delimited
v2: BooleanParameter
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Records To Collection
Import Statement
from orionplatform.cubes import RecordsToCollectionCube
- Converts records sent to the intake port into shards in a collection.
This cube must be initialized with a ShardCollection sent to the initializer port.
Converting records to molecules for collections will only preserve the molecule and none of the other data on the record.
Input Ports
init: CollectionInputPortV2
intake: RecordInputPort
Output Ports
failure: RecordOutputPort
success: ShardOutputPort
Ungrouped Parameters
Output Shard Format: StringParameter
The format of the data that shards will contain
Maximum size of shards: IntegerParameter
How large of shards to emit in bytes
primary_mol: PrimaryMolFieldParameter
Primary Molecule field to retrieve molecule from
records_per_shard: IntegerParameter
The target number of records in a shard. 0 indicates to run up to the max_shard_bytes limit per shard
Shard Upload Attempts: IntegerParameter
Number of attempts to make when uploading a shard
Write Attempts: IntegerParameter
Number of attempts to write each record to a shard’s temporary file
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
Accumulate A List of Shards
Import Statement
from orionplatform.cubes import AccumulateShardsCube
Accumulate a list of shards until a specified size.
A downstream cube can then combine the shards into a larger shard.
Input Ports
intake: ShardInputPort
Output Ports
success: JsonOutputPort
Ungrouped Parameters
size: IntegerParameter
Number of shards to accumulate
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Close Collection
Import Statement
from orionplatform.cubes import CloseCollectionCube
Finalizes shards passed in, and then the collection they belong to.
Warning: do not include this cube in a parallel cube group. Shards must be closed after being emitted from a parallel cube group.
Input Ports
collection_intake: CollectionInputPortV2
intake: ShardInputPort
Ungrouped Parameters
close_collections: BooleanParameter
Whether to close collections to additional shards.
close_shards: BooleanParameter
Whether to mark shards as ready.
delete_collections: BooleanParameter
Whether to delete collections.
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Close Shards
Import Statement
from orionplatform.cubes import CloseShardsCube
Finalizes shards passed in.
If shards are further sent to CloseCollectionCube, then CloseCollectionCube’s close_shards parameter should be set to False.
Warning: do not include this cube in a cube group. Shards must be closed after being emitted from a group.
Input Ports
intake: ShardInputPort
Output Ports
failure: ShardOutputPort
success: ShardOutputPort
Ungrouped Parameters
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
Collection Resize
Import Statement
from orionplatform.cubes import CollectionResizeCube
- Takes lists of shards from AccumulateShardsCube and concatenates those shards together.
This cube must be initialized with a ShardCollection sent to the initializer port.
Input Ports
init: CollectionInputPortV2
intake: JsonInputPort
Output Ports
failure: JsonOutputPort
success: ShardOutputPort
Ungrouped Parameters
Shard Format: StringParameter
The format of the data that shards contain. Used in validation.
validation: StringParameter
What kind of validation should be performed?
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version