Exact Conformer Deduplicator

Cube to remove conformers that are exact duplicates. Molecules are assumed to have a single conformer and if two molecules have the same geometry (within the specified RMSD) then the molecule with the lower energy is retained. Molecules passed to the intake port are compared to the molecules passed to the ‘ref_mol_port.’ To remove all exact duplicates in a set of molecules then you should pass the same records to the ‘ref_mol_port’ and the ‘intake’ port.


Calculation Parameters

  • CPUs (integer) : The number of CPUs to run this cube with
    Default: 1 Min: 1 Max: 128
  • Cube Metrics (string) : Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (decimal) : The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 Min: 128.0 Max: 8589934592
  • Energy tolerance (kcal/mol) (decimal) : Energy tolerance while removing exact duplicates. If the difference in energy between two conformers exceeds the tolerance, they are considered unique regardless of the RMSD
    Default: 5.0
  • GPUs (integer) : The number of GPUs to run this cube with
    Default: 0 Max: 16
  • Instance Tags (string) : Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (string) : The type of instance that this cube needs to be run on
  • Maximum Conformers (integer) : This parameter limits the number of conformers optimized, to prevent accidentally spending more than expected on a single Floe. If more than this number of conformers are generated, then only one conformer will be optimized to learn about the cost of this floe/conformer. If the max number of conformers is set to 0, then ALL generated conformers are optimized.
    Default: 0
  • Memory (MiB) (decimal) : The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 Min: 256.0 Max: 8589934592
  • Metric Period (decimal) : How often to sample metrics, in seconds
    Default: 60 Min: 1 Max: 300
  • RMSD Threshold (decimal) : RMSD threshold for conformer duplicate removal
    Default: 0.1 Min: 1e-05 Max: 0.1
  • Spot policy (string) : Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Field parameters

  • Conformer Index Field Name (Field Type: String) : Conformer Index Field Name
    Default: Conformer_Index
  • Energy Field (Field Type: Float) : Energy Field
    Default: Psi4 Energy (kcal/mol)
  • Extended Log Field (Field Type: StringVec) : Message extended log field
    Default: Extended Log Field
  • Input Molecule Field (Field Type: Chem.Mol) : Primary Molecule Field to use as input to the Cube
  • Log Field (Field Type: String) : Message log field
    Default: Log Field
  • Output Molecule Field (Field Type: Chem.Mol) : Primary Molecule Field to use as output to the Cube

Hardware Parameters

Machine hardware requirements

  • Memory (MiB) (decimal) : The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 Min: 256.0 Max: 8589934592
  • Temporary Disk Space (MiB) (decimal) : The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 Min: 128.0 Max: 8589934592
  • GPUs (integer) : The number of GPUs to run this cube with
    Default: 0 Max: 16
  • CPUs (integer) : The number of CPUs to run this cube with
    Default: 1 Min: 1 Max: 128
  • Instance Type (string) : The type of instance that this cube needs to be run on
  • Spot policy (string) : Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (string) : Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters

  • Metric Period (decimal) : How often to sample metrics, in seconds
    Default: 60 Min: 1 Max: 300
  • Cube Metrics (string) : Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Exact Conformer Deduplicator

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (integer) : The maximum number of messages to bundle together for a parallel cube.
    Default: 1 Min: 1 Max: 65535
  • Maximum Failures (integer) : The maximum number of times to attempt processing a work item
    Default: 10 Min: 1 Max: 100
  • Autoscale this Cube (boolean) : If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (integer) : The maximum number of concurrently running copies of this Cube
    Default: 1000 Min: 1
  • Minimum number of Cubes (integer) : The minimum number of concurrently running copies of this Cube
    Default: 0

Tip

filename: psi4-orion/psi4_orion/psi4_cube_conf.py