Orion Platform Collection Cubes
Cubes
Collection Reader
Import Statement
from orionplatform.cubes import CollectionReaderCube
Description
Emits collections
Output Ports
success: CollectionOutputPort
Ungrouped Parameters
Collection: CollectionInputParameter
Collections to use as input to a floe
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Shard Reader
Import Statement
from orionplatform.cubes import ShardReaderCube
Description
Reads collections and emits the shards from each collection
Output Ports
success: ShardOutputPort
Ungrouped Parameters
Input Collections: CollectionInputParameter
Collections to emit shards from
limit: IntegerParameter
Maximum number of shards to read with this cube
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Collection to Records
Import Statement
from orionplatform.cubes import CollectionToRecordsCube
Description
Decodes a collection into OERecords.
Input Ports
intake: ShardInputPort
intake_list: JsonInputPort
Output Ports
failure: ShardOutputPort
success: RecordOutputPort
Ungrouped Parameters
Shard Format: StringParameter
The format of the data that shards contain
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
ParallelCollectionToRecordsCube
Create Collection
Import Statement
from orionplatform.cubes import CreateCollectionCube
Description
Creates a collection to which shards can be added.
Output Ports
success: CollectionOutputPort
Ungrouped Parameters
Collection Metadata Copy: CollectionInputParameter
Collection to copy the meta data from
Collection Name: CollectionOutputParameter
Name of the collection to create
Collection Tags: StringParameter
Tags to apply to the output dataset, comma delimited
v2: BooleanParameter
Experimental: Create a v2 Collection
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Records To Collection
Import Statement
from orionplatform.cubes import RecordsToCollectionCube
Description
- Converts records sent to the intake port into shards in a collection.
This cube must be initialized with a ShardCollection sent to the initializer port.
Converting records to molecules for collections will only preserve the molecule and none of the other data on the record.
Input Ports
init: CollectionInputPortV2
intake: RecordInputPort
Output Ports
failure: RecordOutputPort
success: ShardOutputPort
Ungrouped Parameters
Output Shard Format: StringParameter
The format of the data that shards will contain
Maximum size of shards: IntegerParameter
How large of shards to emit in bytes
primary_mol: PrimaryMolFieldParameter
Primary Molecule field to retrieve molecule from
records_per_shard: IntegerParameter
The target number of records in a shard. 0 indicates to run up to the max_shard_bytes limit per shard
Shard Upload Attempts: IntegerParameter
Number of attempts to make when uploading a shard
Write Attempts: IntegerParameter
Number of attempts to write each record to a shard’s temporary file
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
ParallelRecordsToCollectionCube
Accumulate A List of Shards
Import Statement
from orionplatform.cubes import AccumulateShardsCube
Description
Accumulate a list of shards until a specified size.
A downstream cube can then combine the shards into a larger shard.
Input Ports
intake: ShardInputPort
Output Ports
success: JsonOutputPort
Ungrouped Parameters
size: IntegerParameter
Number of shards to accumulate
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Close Collection
Import Statement
from orionplatform.cubes import CloseCollectionCube
Description
Finalizes shards passed in, and then the collection they belong to.
Warning: do not include this cube in a parallel cube group. Shards must be closed after being emitted from a parallel cube group.
Input Ports
collection_intake: CollectionInputPortV2
intake: ShardInputPort
Ungrouped Parameters
close_collections: BooleanParameter
Whether to close collections to additional shards.
close_shards: BooleanParameter
Whether to mark shards as ready.
delete_collections: BooleanParameter
Whether to delete collections.
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Close Shards
Import Statement
from orionplatform.cubes import CloseShardsCube
Description
Finalizes shards passed in.
If shards are further sent to CloseCollectionCube, then CloseCollectionCube’s close_shards parameter should be set to False.
Warning: do not include this cube in a cube group. Shards must be closed after being emitted from a group.
Input Ports
intake: ShardInputPort
Output Ports
failure: ShardOutputPort
success: ShardOutputPort
Ungrouped Parameters
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
ParallelCloseShardsCube
Collection Resize
Import Statement
from orionplatform.cubes import CollectionResizeCube
Description
- Takes lists of shards from AccumulateShardsCube and concatenates those shards together.
This cube must be initialized with a ShardCollection sent to the initializer port.
Input Ports
init: CollectionInputPortV2
intake: JsonInputPort
Output Ports
failure: JsonOutputPort
success: ShardOutputPort
Ungrouped Parameters
Shard Format: StringParameter
The format of the data that shards contain. Used in validation.
validation: StringParameter
What kind of validation should be performed?
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Parallel Version
ParallelCollectionResizeCube