Fingerprint Set Similarity Calculation

Main Parameters

Parameter Name

Associated Port

Port Type

Fingerprint Field

Fingerprint Set

Histogram Bin Centers

Histogram Counts

Similarity Score Field

UUID


Calculation Parameters

  • CPUs (integer) : The number of CPUs to run this cube with
    Default: 1 Min: 1 Max: 128
  • Cube Metrics (string) : Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (decimal) : The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 Min: 128.0 Max: 8589934592
  • GPUs (integer) : The number of GPUs to run this cube with
    Default: 0 Max: 16
  • Instance Tags (string) : Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (string) : The type of instance that this cube needs to be run on
  • Memory (MiB) (decimal) : The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 Min: 256.0 Max: 8589934592
  • Metric Period (decimal) : How often to sample metrics, in seconds
    Default: 60 Min: 1 Max: 300
  • Maximum Similarity Score Cutoff (decimal) : The cutoff score for similarity calculation.
  • Similarity Measure (string) : The similarity measure used to 2D similarity calculation.
    Default: Tanimoto
    Choices: Cosine, Dice, Euclid, Manhattan, Tanimoto, Tversky
  • Minimum Similarity Score Cutoff (decimal) : The cutoff score for similarity calculation.
  • Spot policy (string) : Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Field parameters

  • Histogram Counts (Field Type: FloatVec) : The field to store histogram counts of similarity calculation.
    Default: Histogram Counts
  • Histogram Bin Centers (Field Type: FloatVec) : The field to store histogram bin centers of similarity calculation.and molecules.
    Default: Histogram Bin Centers
  • Fingerprint Field (Field Type: Chem.FingerPrint) : Tag name for the field that stores fingerprints.
  • Fingerprint Set (Field Type: RecordVec) : Fingerprint record sets
  • Histogram Bin Centers (Field Type: FloatVec) : The field to store histogram bin centers of similarity calculation.and molecules.
    Default: Histogram Bin Centers
  • Histogram Counts (Field Type: FloatVec) : The field to store histogram counts of similarity calculation.
    Default: Histogram Counts
  • Similarity Score Field (Field Type: Float) : Name for the field that stores fingerprint similarity scores.
  • UUID (Field Type: String) : The field to store unique identifiers for fingerprints and molecules.

2D Similarity Parameters

The parameters of the 2D fingerprint similarity calculation.

  • Similarity Measure (string) : The similarity measure used to 2D similarity calculation.
    Default: Tanimoto
    Choices: Cosine, Dice, Euclid, Manhattan, Tanimoto, Tversky
  • Minimum Similarity Score Cutoff (decimal) : The cutoff score for similarity calculation.
  • Maximum Similarity Score Cutoff (decimal) : The cutoff score for similarity calculation.

Hardware Parameters

Machine hardware requirements

  • Memory (MiB) (decimal) : The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 Min: 256.0 Max: 8589934592
  • Temporary Disk Space (MiB) (decimal) : The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 Min: 128.0 Max: 8589934592
  • GPUs (integer) : The number of GPUs to run this cube with
    Default: 0 Max: 16
  • CPUs (integer) : The number of CPUs to run this cube with
    Default: 1 Min: 1 Max: 128
  • Instance Type (string) : The type of instance that this cube needs to be run on
  • Spot policy (string) : Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (string) : Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters

  • Metric Period (decimal) : How often to sample metrics, in seconds
    Default: 60 Min: 1 Max: 300
  • Cube Metrics (string) : Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Fingerprint Set Similarity Calculation

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (integer) : The maximum number of messages to bundle together for a parallel cube.
    Default: 1 Min: 1 Max: 65535
  • Maximum Failures (integer) : The maximum number of times to attempt processing a work item
    Default: 10 Min: 1 Max: 100
  • Autoscale this Cube (boolean) : If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (integer) : The maximum number of concurrently running copies of this Cube
    Default: 1000 Min: 1
  • Minimum number of Cubes (integer) : The minimum number of concurrently running copies of this Cube
    Default: 0

Tip

filename: cheminfo/graphsim/sim2d_set_calc.py