Orion Platform File Cubes
When you want to read files into floes, often they will have to be converted to records, so that they can be interpreted by other cubes. The following cubes provide utilities for converting files to records and records to various file formats.
Cubes
Binary File Reader
Import Statement
from orionplatform.cubes import BinaryFileReaderCube
Description
- A cube that reads one or more files and emits their contents in a single stream.
Generally used with a BinaryInputPort initializer.
Output Ports
success: BinaryOutputPort
Ungrouped Parameters
File to use as input: FileInputParameter
The file to read from in binary mode
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
File to Record Converter
Import Statement
from orionplatform.cubes import FileToRecordConverter
Description
Reads a molecule or csv file and converts to records
Output Ports
success: RecordOutputPort
Ungrouped Parameters
File to use as input: FileInputParameter
Molecular or CSV file to convert to records for use in a floe
File extension to append to input the file name: StringParameter
Override the file format derived from input file name
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Archive Reader
Import Statement
from orionplatform.cubes import ArchiveConverterCube
Description
- Converts a tar or zip file into records (if the output port is connected)
or directly into datasets (if the output port isn’t connected).
Output Ports
success: RecordOutputPort
Ungrouped Parameters
Tar or zip file to use as input: FileInputParameter
Archive file to convert to records
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Record to File Converter
Import Statement
from orionplatform.cubes import RecordsToFileConverter
Description
- A writer that converts a stream of records to an OE-recognized file.
The format is determined from the file extension given to the “file_name” parameter. The cube will raise an exception if the content of the records cannot be converted into the requested file format.
Input Ports
intake: RecordInputPort
Ungrouped Parameters
Input Dataset: DatasetInputParameter
The dataset(s) to read records from
file_name: FileOutputParameter
Name of the file to create from records. The file extension will determine the format.
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Record File to Record Converter
Import Statement
from orionplatform.cubes import RecordFileToRecordConverter
Description
“Reads a record file and converts to records
Output Ports
success: RecordOutputPort
Ungrouped Parameters
File to use as input: FileInputParameter
Record file to use as input to a floe
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Record to Record File
Import Statement
from orionplatform.cubes import RecordsToRecordFileConverter
Description
A writer that writes a stream of records to a record file (i.e. oedb).
Input Ports
intake: RecordBytesInputPort
Ungrouped Parameters
file_name: FileOutputParameter
Name of the file to create from records
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds
Url to File
Import Statement
from orionplatform.cubes import URLToFileCube
Description
Reads a file from a URL and uploads it to Orion
Ungrouped Parameters
filename: FileOutputParameter
New file name (defaults to URL path basename)
Logging interval: IntegerParameter
Log progress every N seconds (0 to disable)
None: StringParameter
URL of file to be uploaded to Orion
Parameter Groups
Floe Internals
buffer_size: IntegerParameter
The amount of data buffered before sending downstream
Hardware
CPUs: IntegerParameter
The number of CPUs to run this cube with
Temporary Disk Space (MiB): DecimalParameter
The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
GPUs: IntegerParameter
The number of GPUs to run this cube with
Instance Tags: StringParameter
Only run on machines with matching tags (comma separated)
Instance Type: StringParameter
The type of instance that this cube needs to be run on
Max Backlog Wait: IntegerParameter
The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
Memory (MiB): DecimalParameter
The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Thread limit per CPU: IntegerParameter
The number of threads per CPU
Shared Memory (MiB): DecimalParameter
The amount of shared memory to allow a container to address
Spot policy: StringParameter
Control cube placement on spot market instances
Metrics
Cube Metrics: StringParameter
Set of metrics to be collected
Metric Period: DecimalParameter
How often to sample metrics, in seconds