Subset Molecule

../../../../../_images/ProcessCubeIcon.svg

This cube creates subset molecules from tagged atoms and/or bonds on input molecules.

Input molecules are read from the field specified by the Input Molecule Field parameter. The subset molecule is stored in the field specified by the Output Molecule Field parameter, and the record is sent to the success port.

The atoms and bonds to be used for subsetting are identified by the Atom Tag and/or Bond Tag parameters, respectively. At least one of these two parameters is required to be set. The additional parameters listed in the Tag Interpretation Parameters section can influence how the tagging information is utilized.

Upstream Cubes

See also

Calculation Parameters

  • Subset Fixups (adjust_mol) type: string: This parameter controls valence adjustments on the subset molecule returned.
    Default: hcounts
    Choices: none, hcounts, rgroups
  • Atom Tag (atom_tag) type: string: The tag that is used to identify atoms for the subset.
  • Bond Tag (bond_tag) type: string: The tag that is used to identify bonds for the subset.
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Atom/Bond Tag Data Type (tag_data) type: string: This parameter identifies the data type for use with ‘Tag Handling’ parameter to identify tagged atoms/bonds (auto: try all POD data oechem.Types)
    Default: auto
    Choices: auto, bool, int, string
  • Tag Handling (tag_handling) type: string: This parameter determines whether the presence of the tag indicates the tagged atoms/bonds, or whether the data should be inspected for a non-null value.
    Default: tagdata
    Choices: tagonly, tagdata
  • Invert Atom/Bond Tagging (tag_invert) type: boolean: This parameter indicates whether to invert the sense of the atom/bond tagging. If True, the subset operation will occur on the untagged atoms/bonds.
    Default: False
  • Atom/Bond Tag Interpretation (tag_perceive) type: string: This parameter determines whether bonds between tagged atoms imply tagged bonds and tagged bonds imply the atom endpoints.
    Default: atoms2bonds
    Choices: none, atoms2bonds

Field parameters

  • None (in_mol_field) type: Field Type: Chem.Mol:
  • None (out_mol_field) type: Field Type: Chem.Mol:

Subset Parameters

The parameters that influence the subset method.
  • Subset Fixups (None) type: string: This parameter controls valence adjustments on the subset molecule returned.
    Default: hcounts
    Choices: none, hcounts, rgroups

Tag Interpretation Parameters

The parameters that determine how to interpret the tagging.
  • Atom Tag (None) type: string: The tag that is used to identify atoms for the subset.
  • Bond Tag (None) type: string: The tag that is used to identify bonds for the subset.
  • Atom/Bond Tag Data Type (None) type: string: This parameter identifies the data type for use with ‘Tag Handling’ parameter to identify tagged atoms/bonds (auto: try all POD data oechem.Types)
    Default: auto
    Choices: auto, bool, int, string
  • Invert Atom/Bond Tagging (None) type: boolean: This parameter indicates whether to invert the sense of the atom/bond tagging. If True, the subset operation will occur on the untagged atoms/bonds.
    Default: False
  • Tag Handling (None) type: string: This parameter determines whether the presence of the tag indicates the tagged atoms/bonds, or whether the data should be inspected for a non-null value.
    Default: tagdata
    Choices: tagonly, tagdata
  • Atom/Bond Tag Interpretation (None) type: string: This parameter determines whether bonds between tagged atoms imply tagged bonds and tagged bonds imply the atom endpoints.
    Default: atoms2bonds
    Choices: none, atoms2bonds

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Subset Molecule

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.
    Default: 1 , Min: 1, Max: 65535
  • Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work item
    Default: 10 , Min: 1, Max: 100
  • Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this Cube
    Default: 1000 , Min: 1
  • Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this Cube
    Default: 0