Orion Platform File Cubes

When you want to read files into floes, often they will have to be converted to records, so that they can be interpreted by other cubes. The following cubes provide utilities for converting files to records and records to various file formats.

Binary File Reader

Import Statement

from orionplatform.cubes import BinaryFileReaderCube

Description

A cube that reads one or more files and emits their contents in a single stream.

Generally used with a BinaryInputPort initializer.

Output Ports

  • success: BinaryOutputPort

Ungrouped Parameters

  • File to use as input: FileInputParameter

    The file to read from in binary mode

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

File to Record Converter

Import Statement

from orionplatform.cubes import FileToRecordConverter

Description

Reads a molecule or csv file and converts to records

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • File to use as input: FileInputParameter

    Molecular or CSV file to convert to records for use in a floe

  • File extension to append to input the file name: StringParameter

    Override the file format derived from input file name

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Archive Reader

Import Statement

from orionplatform.cubes import ArchiveConverterCube

Description

Converts a tar or zip file into records (if the output port is connected)

or directly into datasets (if the output port isn’t connected).

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • Tar or zip file to use as input: FileInputParameter

    Archive file to convert to records

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Record to File Converter

Import Statement

from orionplatform.cubes import RecordsToFileConverter

Description

A writer that converts a stream of records to an OE-recognized file.

The format is determined from the file extension given to the “file_name” parameter. The cube will raise an exception if the content of the records cannot be converted into the requested file format.

Input Ports

  • intake: RecordInputPort

Ungrouped Parameters

  • Input Dataset: DatasetInputParameter

    The dataset(s) to read records from

  • file_name: FileOutputParameter

    Name of the file to create from records. The file extension will determine the format.

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Record File to Record Converter

Import Statement

from orionplatform.cubes import RecordFileToRecordConverter

Description

“Reads a record file and converts to records

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • File to use as input: FileInputParameter

    Record file to use as input to a floe

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Record to Record File

Import Statement

from orionplatform.cubes import RecordsToRecordFileConverter

Description

A writer that writes a stream of records to a record file (i.e. oedb).

Input Ports

  • intake: RecordBytesInputPort

Ungrouped Parameters

  • file_name: FileOutputParameter

    Name of the file to create from records

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds

Url to File

Import Statement

from orionplatform.cubes import URLToFileCube

Description

Reads a file from a URL and uploads it to Orion

Ungrouped Parameters

  • filename: FileOutputParameter

    New file name (defaults to URL path basename)

  • Logging interval: IntegerParameter

    Log progress every N seconds (0 to disable)

  • None: StringParameter

    URL of file to be uploaded to Orion

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • CPUs: IntegerParameter

    The number of CPUs to run this cube with

  • Temporary Disk Space (MiB): DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • GPUs: IntegerParameter

    The number of GPUs to run this cube with

  • Instance Tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • Instance Type: StringParameter

    The type of instance that this cube needs to be run on

  • Max Backlog Wait: IntegerParameter

    The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated

  • Memory (MiB): DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Thread limit per CPU: IntegerParameter

    The number of threads per CPU

  • Shared Memory (MiB): DecimalParameter

    The amount of shared memory to allow a container to address

  • Spot policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • Cube Metrics: StringParameter

    Set of metrics to be collected

  • Metric Period: DecimalParameter

    How often to sample metrics, in seconds