Molecule Substructure Count

../../../../../_images/CalculationInitCubeIcon.svg

This cube counts substructure search matches of input molecules against a query molecule.

Input molecules are read from the field specified by the Input Molecule Field parameter. The substructure is read from the first record on the init initialization port, from the field specified by the Query Field parameter. The number of matches is stored in the field specified by the Match Count Field parameter, and the record is sent to the success port.

If there is no match detected, the record is still sent to the success port with a match count of zero.

Note

While this cube has to modify the input molecule (performing necessary perceptions and adding/suppressing hydrogens prior to performing the substructure search), these changes will not persist when the record is sent to the success port.

See also

Calculation Parameters

  • Atom Expression Options (atom_expr) type: string: Atom expression flag that controls how atoms are matched.
    Default: DefaultAtoms
    Choices: AutomorphAtoms, DefaultAtoms, ExactAtoms
  • Bond Expression Options (bond_expr) type: string: Bond expression flag that controls how bonds are matched.
    Default: DefaultBonds
    Choices: AutomorphBonds, DefaultBonds, ExactBonds
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Hydrogen Handling (hydrogen_handling) type: string: This parameter determines whether to suppress or add explicit hydrogens to the target molecules prior to performing the substructure search.

    Choices: AddExplicitHydrogens, SuppressHydrogens
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Unique Match (unique_match) type: boolean: A match or subgraph is considered unique if it differs from all other subgraphs found previously by at least one atom.
    Default: True

Field parameters

  • None (in_mol_field) type: Field Type: Chem.Mol:
  • Query Field (init_mol_field) type: Field Type: Chem.Mol: The name of the field that stores the query molecule on the initialization record. If left blank the primary molecule field will be used.
  • Match Count Field (nr_matches_field) type: Field Type: Int: The field that stores the number of substructure matches.
  • None (out_mol_field) type: Field Type: Chem.Mol:

Substructure Search Parameters

The parameters of substructure search.
  • Hydrogen Handling (None) type: string: This parameter determines whether to suppress or add explicit hydrogens to the target molecules prior to performing the substructure search.

    Choices: AddExplicitHydrogens, SuppressHydrogens
  • Unique Match (None) type: boolean: A match or subgraph is considered unique if it differs from all other subgraphs found previously by at least one atom.
    Default: True
  • Atom Expression Options (None) type: string: Atom expression flag that controls how atoms are matched.
    Default: DefaultAtoms
    Choices: AutomorphAtoms, DefaultAtoms, ExactAtoms
  • Bond Expression Options (None) type: string: Bond expression flag that controls how bonds are matched.
    Default: DefaultBonds
    Choices: AutomorphBonds, DefaultBonds, ExactBonds

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Molecule Substructure Count

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.
    Default: 1 , Min: 1, Max: 65535
  • Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work item
    Default: 10 , Min: 1, Max: 100
  • Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this Cube
    Default: 1000 , Min: 1
  • Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this Cube
    Default: 0