Orion Platform File Cubes

When wanting to read files into Floes, often they will have to be converted to Records so that they can be interpreted by other cubes. The following cubes provide utilities for converting files to records and records to various file formats.

Binary File Reader

Import

from orionplatform.cubes import BinaryFileReaderCube

Description

A cube that reads one or more files and emits their contents in a single stream.

Generally used with a BinaryInputPort initializer.

Output Ports

  • success: BinaryOutputPort

Ungrouped Parameters

  • file: FileInputParameter

    The file to read from in binary mode

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • cpu_count: IntegerParameter

    The number of CPUs to run this cube with

  • disk_space: DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • gpu_count: IntegerParameter

    The number of GPUs to run this cube with

  • instance_tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • instance_type: StringParameter

    The type of instance that this cube needs to be run on

  • memory_mb: DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • spot_policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • cube_metrics: StringParameter

    Set of metrics to be collected

  • metric_period: DecimalParameter

    How often to sample metrics, in seconds

File to Record Converter

Import

from orionplatform.cubes import FileToRecordConverter

Description

Reads a molecule or csv file and converts to records

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • file: FileInputParameter

    Molecular or CSV file to convert to records for use in a floe

  • file_ext: StringParameter

    Override the file format derived from input file name

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • cpu_count: IntegerParameter

    The number of CPUs to run this cube with

  • disk_space: DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • gpu_count: IntegerParameter

    The number of GPUs to run this cube with

  • instance_tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • instance_type: StringParameter

    The type of instance that this cube needs to be run on

  • memory_mb: DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • spot_policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • cube_metrics: StringParameter

    Set of metrics to be collected

  • metric_period: DecimalParameter

    How often to sample metrics, in seconds

Archive Reader

Import

from orionplatform.cubes import ArchiveConverterCube

Description

Converts a tar or zip file into records (if the output port is connected)

or directly into datasets (if the output port isn’t connected).

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • file: FileInputParameter

    Archive file to convert to records

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • cpu_count: IntegerParameter

    The number of CPUs to run this cube with

  • disk_space: DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • gpu_count: IntegerParameter

    The number of GPUs to run this cube with

  • instance_tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • instance_type: StringParameter

    The type of instance that this cube needs to be run on

  • memory_mb: DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • spot_policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • cube_metrics: StringParameter

    Set of metrics to be collected

  • metric_period: DecimalParameter

    How often to sample metrics, in seconds

Record to File Converter

Import

from orionplatform.cubes import RecordsToFileConverter

Description

A writer that converts a stream of records to an OE-recognized file.

The format is determined from the file extension given to the “file_name” parameter. The cube will raise an exception if the content of the records cannot be converted into the requested file format.

Input Ports

  • intake: RecordInputPort

Ungrouped Parameters

  • file_name: FileOutputParameter

    Name of the file to create from records. The file extension will determine the format.

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • cpu_count: IntegerParameter

    The number of CPUs to run this cube with

  • disk_space: DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • gpu_count: IntegerParameter

    The number of GPUs to run this cube with

  • instance_tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • instance_type: StringParameter

    The type of instance that this cube needs to be run on

  • memory_mb: DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • spot_policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • cube_metrics: StringParameter

    Set of metrics to be collected

  • metric_period: DecimalParameter

    How often to sample metrics, in seconds

Record File to Record Converter

Import

from orionplatform.cubes import RecordFileToRecordConverter

Description

“Reads a record file and converts to records

Output Ports

  • success: RecordOutputPort

Ungrouped Parameters

  • file: FileInputParameter

    Record file to use as input to a floe

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • cpu_count: IntegerParameter

    The number of CPUs to run this cube with

  • disk_space: DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • gpu_count: IntegerParameter

    The number of GPUs to run this cube with

  • instance_tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • instance_type: StringParameter

    The type of instance that this cube needs to be run on

  • memory_mb: DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • spot_policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • cube_metrics: StringParameter

    Set of metrics to be collected

  • metric_period: DecimalParameter

    How often to sample metrics, in seconds

Record to Record File

Import

from orionplatform.cubes import RecordsToRecordFileConverter

Description

A writer that writes a stream of records to a record file (i.e. oedb).

Input Ports

  • intake: RecordBytesInputPort

Ungrouped Parameters

  • file_name: FileOutputParameter

    Name of the file to create from records

Parameter Groups

Floe Internals

  • buffer_size: IntegerParameter

    The amount of data buffered before sending downstream

Hardware

  • cpu_count: IntegerParameter

    The number of CPUs to run this cube with

  • disk_space: DecimalParameter

    The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • gpu_count: IntegerParameter

    The number of GPUs to run this cube with

  • instance_tags: StringParameter

    Only run on machines with matching tags (comma separated)

  • instance_type: StringParameter

    The type of instance that this cube needs to be run on

  • memory_mb: DecimalParameter

    The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • spot_policy: StringParameter

    Control cube placement on spot market instances

Metrics

  • cube_metrics: StringParameter

    Set of metrics to be collected

  • metric_period: DecimalParameter

    How often to sample metrics, in seconds