Migrating from CubeRecord

Summary of differences

In CubeRecord, the cube subclasses defined a combination of cube functionality, ports, and parameters. Decorators were then used to modify the behavior of these various objects. In Orion Platform, the definition of cubes, ports and parameters have been simplified and made explicit.

  • All cubes are subclasses of ComputeCube from the Floe library and they provide basic cube functionality only. They do not define ports or parameters.

    • OEMolPropertyCube and OEProcessMolCube have been eliminated.

  • Mixins (e.g. RecordPortsMixin) are available to define standard ports. Other ports are defined explicitly.

  • Parameters are defined explicitly.

    • Mixins and decorators for parameters have been eliminated.

  • See all API Changes

Migrating Cubes

This section describes how to convert a fairly complex example cube using the OEProcessMolCube CubeRecord API to the Orion Platform API.

Note

Cubes written using OERecordCube or OEMolRecordCube are easier to convert since the behavior of those CubeRecord cubes closely matches the behavior of the base ComputeCube from Floe. See API Changes for more information.

Here is the full starting CubeRecord cube. It uses an Initializer port and acts upon a molecule stream using the OEProcessMolCube. The cube inherits from OEProcessMolCube and InitMolRecordMixin. OEProcessMolCube makes the class a cube and defines its record ports. InitMolRecordMixin provides an init initializer port and an init_mol_field field for retrieving the molecule from the records.

Starting CubeRecord cube with inheritance
import random

from cuberecord import (
    OEProcessMolCube,
    IntFieldParameter,
    InitMolRecordMixin,
    modify_field_param,
)


# Modify the init_mol_field parameter set by InitMolRecordMixin
@modify_field_param(
    "init_mol_field",
    ports="init",
    title="Query Molecule",
    description="Port that expects molecules that will be counted",
)
class ExampleCube(OEProcessMolCube, InitMolRecordMixin):
    """Example Cube provides helps you in the situation where you want
    to generate a random number between 0 and the number of atoms in a molecule
    *plus* the number of molecules you received on an intializer port.

    This is a contrived example, using Cube Record
    """

    title = "Complex Cube"
    classification = [["Example", "Contrived"]]
    tags = ["example"]
    description = "An example cube showing CubeRecord usage"

    random_num_field = IntFieldParameter(
        "random_num_field",
        default="Random Number",
        title="Random Number Field",
        # Has a ports field
        ports="success",
        description="Field to assign a number to",
    )

    def begin(self):
        # Count the number of molecules received on the initializer port
        self.number_of_init_mols = 0
        for mol in self.get_init_molecules():
            self.number_of_init_mols += 1

    def process_mol(self, mol):
        # Get the number of atoms
        number_of_atoms = mol.NumAtoms()
        # Calculate a random number between 0 and the number of atoms plus
        # the number of initialized mols
        random_num = random.randint(0, number_of_atoms + self.number_of_init_mols)
        # Return the molecule and the random number that goes with it
        return mol, random_num

Using Orion Platform, the class should inherit from ComputeCube to provide basic cube functionality and RecordPortsMixin to provide default intake, success, and failure ports.

After defining the inheritance, an initializer port and a related field parameter are needed. Since these are no longer defined in cube subclasses, they are defined explicitly.

A RecordInputPort port is defined as an initializer port, and a read only PrimaryMolFieldParameter parameter is defined to read the molecule from the initializer port records. Note that the modify_field_param decorator is no longer required since the initializer port is fully defined in the class.

Next, an in_mol_field parameter, another instance of PrimaryMolFieldParameter, is defined to retrieve the primary molecule from the records received on the intake port. In this example the cube only needs an input field as it does not modify the molecule.

Orion Platform cube inheritance
import random

from floe.api import ComputeCube
from orionplatform.ports import RecordInputPort
from orionplatform.mixins import RecordPortsMixin
from orionplatform.parameters import IntegerFieldParameter, PrimaryMolFieldParameter


class ExampleCube(RecordPortsMixin, ComputeCube):
    """Example Cube provides helps you in the situation where you want
    to generate a random number between 0 and the number of atoms in a molecule
    *plus* the number of molecules you received on an intializer port.

    This is a contrived example, using Orion Platform
    """

    title = "Complex Cube"
    classification = [["Example", "Contrived"]]
    tags = ["example"]
    description = "An example cube showing OrionPlatform usage"

    # Define the initializer port and corresponding field for retrieving values
    init = RecordInputPort("init", initializer=True)
    init_mol_field = PrimaryMolFieldParameter(
        "init_mol_field",
        read_only=True,
        # Set the title and description without the use of a decorator
        title="Query Molecule",
        description="Port that expects molecules that will be counted",
    )

    # Define the field to retrieve the primary molecule
    in_mol_field = PrimaryMolFieldParameter(
        "in_mol_field",
        read_only=True,
        description="Primary Molecule Field to use as input to the cube",
    )

For the majority of FieldParameters defined on a CubeRecord cube, the migration is straightforward. Several FieldParameters have been renamed, and support for the ports argument has been removed.


random_num_field = IntFieldParameter(
    "random_num_field",
    default="Random Number",
    title="Random Number Field",
    # Has a ports field
    ports="success",
    description="Field to assign a number to",
)

In this example the CubeRecord cube has a IntFieldParameter. To migrate to Orion Platform, rename the parameter to IntegerFieldParameter and remove the keyword argument ports.

random_num_field = IntegerFieldParameter(
    "random_num_field",
    default="Random Number",
    title="Random Number Field",
    description="Field to assign a number to",
)

Note

To see the other parameters that have been renamed or replaced, see Parameter Changes

In CubeRecord, molecules from the initializer port are accessed using the utility method InitMolRecordMixin.get_init_molecules, provide by the InitMolRecordMixin.

random_num_field = IntFieldParameter(
    "random_num_field",
    default="Random Number",
    title="Random Number Field",
    # Has a ports field
    ports="success",
    description="Field to assign a number to",
)

With Orion Platform, access the records from the ports directly.

    self.number_of_init_mols = 0
    for record in self.init:
        # See if the initalizer records have a molecule
        if not record.has_value(self.args.init_mol_field):
            continue
        self.number_of_init_mols += 1

The CubeRecord example uses the OEProcessMolCube.process_mol method that is called with the molecules from incoming records.

def process_mol(self, mol):
    # Get the number of atoms
    number_of_atoms = mol.NumAtoms()
    # Calculate a random number between 0 and the number of atoms plus
    # the number of initialized mols
    random_num = random.randint(0, number_of_atoms + self.number_of_init_mols)
    # Return the molecule and the random number that goes with it
    return mol, random_num

Using Orion Platform, the molecules are extracted from the records from the process() method using the in_mol_field field defined on the cube.

    if not record.has_value(self.args.in_mol_field):
        # If the record doesn't have the field you want, fail it
        self.failure.emit(record)
        return
    mol = record.get_value(self.args.in_mol_field)

    # Get the number of atoms
    number_of_atoms = mol.NumAtoms()
    # Calculate a random number between 0 and the number of atoms plus
    # the number of initialized mols
    random_num = random.randint(0, number_of_atoms + self.number_of_init_mols)

    # Set the value on the record, and emit the record
    record.set_value(self.args.random_num_field, random_num)
    self.success.emit(record)

The final Orion Platform cube. Note that the cube is longer, however there is no implicit coupling between the classes, ports, and parameters and the overall programming pattern for all cubes is uniform.

Orion Platform cube
import random

from floe.api import ComputeCube
from orionplatform.ports import RecordInputPort
from orionplatform.mixins import RecordPortsMixin
from orionplatform.parameters import IntegerFieldParameter, PrimaryMolFieldParameter


class ExampleCube(RecordPortsMixin, ComputeCube):
    """Example Cube provides helps you in the situation where you want
    to generate a random number between 0 and the number of atoms in a molecule
    *plus* the number of molecules you received on an intializer port.

    This is a contrived example, using Orion Platform
    """

    title = "Complex Cube"
    classification = [["Example", "Contrived"]]
    tags = ["example"]
    description = "An example cube showing OrionPlatform usage"

    # Define the initializer port and corresponding field for retrieving values
    init = RecordInputPort("init", initializer=True)
    init_mol_field = PrimaryMolFieldParameter(
        "init_mol_field",
        read_only=True,
        # Set the title and description without the use of a decorator
        title="Query Molecule",
        description="Port that expects molecules that will be counted",
    )

    # Define the field to retrieve the primary molecule
    in_mol_field = PrimaryMolFieldParameter(
        "in_mol_field",
        read_only=True,
        description="Primary Molecule Field to use as input to the cube",
    )

    # Renamed to IntegerFieldParameter from IntFieldParameter in Cube Record
    random_num_field = IntegerFieldParameter(
        "random_num_field",
        default="Random Number",
        title="Random Number Field",
        description="Field to assign a number to",
    )

    def begin(self):
        # Count the number of molecules received on the initializer port
        self.number_of_init_mols = 0
        for record in self.init:
            # See if the initalizer records have a molecule
            if not record.has_value(self.args.init_mol_field):
                continue
            self.number_of_init_mols += 1

    def process(self, record, port):
        if not record.has_value(self.args.in_mol_field):
            # If the record doesn't have the field you want, fail it
            self.failure.emit(record)
            return
        mol = record.get_value(self.args.in_mol_field)

        # Get the number of atoms
        number_of_atoms = mol.NumAtoms()
        # Calculate a random number between 0 and the number of atoms plus
        # the number of initialized mols
        random_num = random.randint(0, number_of_atoms + self.number_of_init_mols)

        # Set the value on the record, and emit the record
        record.set_value(self.args.random_num_field, random_num)
        self.success.emit(record)

Migrating Cube Tests

In CubeRecord, there was several utilities for testing cubes. Orion Platform removes these in favor of Floe’s builtin testing capabilities.

Testing with CubeRecord provides its own OERecordCubeTestRunner and conversion logic in the form of DataRecordStream. OERecordCubeTestRunner functions almost exactly like Floe’s CubeTestRunner, but sets initializer port data using OERecordCubeTestRunner.set_initializer_port_input

from unittest import TestCase

from example.cubes import ExampleCube
from cuberecord import DataRecordStream, OERecordCubeTestRunner

from datarecord import Types, OEField


class TestExampleCube(TestCase):
    def test_example_cube(self):
        random_field_name = "My Random Number"
        cube = ExampleCube("Testing Example Cube")
        test_runner = OERecordCubeTestRunner(cube)

        # Uses CubeRecord conversion
        records = list(DataRecordStream("drugs.sdf"))
        # Set the initializer data before starting
        # Named differently than the floe method
        test_runner.set_init_port_records("init", records)
        test_runner.set_parameters(random_num_field=random_field_name)
        # Sets up the cube
        test_runner.start()
        for record in records:
            cube.process(record, "intake")
        # Finalizes cube
        test_runner.finalize()

        failure_output = test_runner.outputs["failure"]
        self.assertEqual(failure_output.qsize(), 0)

        output = test_runner.outputs["success"]
        test_field = OEField(random_field_name, Types.Int)
        while not output.empty():
            rec = output.get()
            self.assertTrue(rec.has_value(test_field))

Testing cubes with Orion Platform uses the CubeTestRunner provided by Floe and the conversion logic in DRConvert.

from unittest import TestCase

from example.cubes import ExampleCube

from floe.test import CubeTestRunner
from drconvert import MolFileConverter

from datarecord import Types, OEField


class TestExampleCube(TestCase):
    def test_example_cube(self):
        random_field_name = "My Random Number"
        cube = ExampleCube("Testing Example Cube")
        test_runner = CubeTestRunner(cube)

        # Uses DRConvert's conversion
        records = list(MolFileConverter("drugs.sdf"))
        # Set the initializer data before starting
        test_runner.set_initializer_input("init", records)
        test_runner.set_parameters(random_num_field=random_field_name)
        test_runner.start()
        for record in records:
            cube.process(record, "intake")
        test_runner.finalize()

        failure_output = test_runner.outputs["failure"]
        self.assertEqual(failure_output.qsize(), 0)

        output = test_runner.outputs["success"]
        test_field = OEField(random_field_name, Types.Int)
        while not output.empty():
            rec = output.get()
            self.assertTrue(rec.has_value(test_field))

The differences between the two methods of testing cubes are minimal and requires few changes.

Refer to the respective libraries in the Orion Programming Documentation for further information.

API Changes

The following is a summary of CubeRecord API replacements in Orion Platform that provide comparable functionality.

Cube Changes

All base cube classes in CubeRecord must be changed to inherit from the Floe ComputeCube base class or other related base cube classes (e.g. SinkCube). Ports are either defined directly or can be defined using the port mixins such as RecordPortsMixin.

Note

ComputeCubes as defined by Floe only have a process() method, so any cubes that define a get_property, process_mol or other such methods as defined by CubeRecord will need to be rewritten to use process() directly.

Cube Testing

Testing should be performed using CubeTestRunner from Floe.

Port Changes

RecordInputPort

Functions in the same way as the class of the same name in CubeRecord, but returns OEMolRecord rather than OERecord.

RecordOutputPort

A drop in replacement for the CubeRecord port of the same name.

RecordBytesInputPort

A replacement for the CubeRecord RawRecordInputPort port.

CollectionInputPort

A drop in replacement for the CubeRecord port of the same name.

CollectionOutputPort

A drop in replacement for the CubeRecord port of the same name.

ShardInputPort

A drop in replacement for the CubeRecord port of the same name.

Warning

There are no longer Shard ports that are specific to the format of shard being passed.

ShardOutputPort

A drop in replacement for the CubeRecord port of the same name.

Parameter Changes

DatasetInputParameter

A replacement for DataSourceParameter, renamed for consistency with Floe and other parameter names.

DatasetOutputParameter

A drop in replacement for the CubeRecord parameter of the same, supports staged workfloe connections with DatasetInputParameter.

CollectionInputParameter

A drop in replacement for the CubeRecord parameter of the same.

CollectionOutputParameter

New parameter for indicating outputting a collection. Supports multistage workfloes with CollectionInputParameter.

SecretInputParameter

New parameter for accessing Orion Secrets.

FieldParameter

A drop in replacement for the CubeRecord parameter of the same name. Does not have keyword argument ports.

Warning

FieldParameter does not support a value of None for its field_type. To provide comparable functionality, use a StringParameter in conjunction with the DataRecord API to access a field by name.

InputMolFieldParameter

A replacement for InputMoleculeFieldParameter. Does not have keyword argument ports.

Note

A thin wrapper around PrimaryMolFieldParameter with read_only set to True

OutputMolFieldParameter

A replacement for OutputMoleculeFieldParameter. Does not have keyword argument ports.

Note

A thin wrapper around PrimaryMolFieldParameter with read_only set to False

PrimaryMolFieldParameter

An alternative to InputMoleculeFieldParameter and OutputMoleculeFieldParameter. The read_only flag should be set to True to treat it as an input field, otherwise it is an input/output field. Does not have keyword argument ports.

IntegerFieldParameter

Behaves the same as the IntFieldParameter in CubeRecord, however has been renamed for consistency with Floe’s IntegerParameter. Does not have keyword argument ports.

BooleanFieldParameter

Behaves the same as the BoolFieldParameter in CubeRecord, however has been renamed for consistency with Floe’s BooleanParameter. Does not have keyword argument ports.

DecimalFieldParameter

A drop in replacement for the CubeRecord parameter of the same name. Does not have keyword argument ports.

Note

Also is aliased as FloatFieldParameter

StringFieldParameter

A drop in replacement for the CubeRecord parameter of the same name. Does not have keyword argument ports.

MolFieldParameter

A drop in replacement for the CubeRecord parameter of the same name. Does not have keyword argument ports.

Record Conversion Changes

All CubeRecord conversion APIs should be replaced by usage of DRConvert.

Record Handler Changes

All Data Record handlers defined in CubeRecord have been deprecated and have been replaced by link functionality.