Molecule Similarity Calculation¶
This cube calculates fingerprint similarity scores between input molecules and a query (reference) molecule.
The input molecules are read from the intake port, from the field specified by the Input Molecule Field parameter.
The query molecule is read from the first record on the init initialization port, from the field specified by the Query Field parameter.
The type of the generated fingerprint is determined by the Fingerprint Type parameter. The similarity measure that is used to calculate the score is determined by the Similarity Measure parameter.
The calculated score is stored in the field specified by Similarity Score Field, and the record is sent to the success port.
Note
This cube generates fingerprints on-the-fly in order to calculate the similarity score, but only the score will be stored on the output record.
See also
- Fingerprint Generation and Similarity Measures sections in GraphSim TK manual.
Parameter Details¶
Calculation Parameters¶
CPUs (integer) : The number of CPUs to run this cube withDefault: 1 Min: 1 Max: 128
Cube Metrics (string) : Set of metrics to be collectedChoices: cpu, disk, memory, network
Temporary Disk Space (MiB) (decimal) : The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 Min: 128.0 Max: 8589934592
Fingerprint Type (string) : The fingerprint type generated for similarity calculation.Default: TreeChoices: Circular, Lingo, MACCS, Path, Tree
GPUs (integer) : The number of GPUs to run this cube withDefault: 0 Max: 16
Instance Tags (string) : Only run on machines with matching tags (comma separated)Default: “”
Instance Type (string) : The type of instance that this cube needs to be run on
Max Rotors (integer) : Cutoff of rotatable bonds. The cube will skip molecules with rotors more than the cutoff.Default: 20 Min: 1 Max: 9999
Memory (MiB) (decimal) : The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 Min: 256.0 Max: 8589934592
Metric Period (decimal) : How often to sample metrics, in secondsDefault: 60 Min: 1 Max: 300
Similarity Measure (string) : The similarity measure used to 2D similarity calculation.Default: TanimotoChoices: Cosine, Dice, Euclid, Manhattan, Tanimoto, Tversky
Spot policy (string) : Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Field parameters¶
None (Field Type: StringVec) : Message extended log fieldDefault: Extended Log Field
None (Field Type: Chem.Mol) :
Query Field (Field Type: Chem.Mol) : The name of the field on the initialization record that stores the query molecule. If left blank, the primary molecule field will be used.
None (Field Type: String) : Message log fieldDefault: Log Field
Similarity Score Field (Field Type: Float) : Name for the field that stores fingerprint similarity scores.
2D Similarity Parameters¶
The parameters of the 2D fingerprint similarity calculation.
Fingerprint Type (string) : The fingerprint type generated for similarity calculation.Default: TreeChoices: Circular, Lingo, MACCS, Path, Tree
Similarity Measure (string) : The similarity measure used to 2D similarity calculation.Default: TanimotoChoices: Cosine, Dice, Euclid, Manhattan, Tanimoto, Tversky
Hardware Parameters¶
Machine hardware requirements
Memory (MiB) (decimal) : The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 Min: 256.0 Max: 8589934592
Temporary Disk Space (MiB) (decimal) : The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 Min: 128.0 Max: 8589934592
GPUs (integer) : The number of GPUs to run this cube withDefault: 0 Max: 16
CPUs (integer) : The number of CPUs to run this cube withDefault: 1 Min: 1 Max: 128
Instance Type (string) : The type of instance that this cube needs to be run on
Spot policy (string) : Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (string) : Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
Cube Metric Parameters
Metric Period (decimal) : How often to sample metrics, in secondsDefault: 60 Min: 1 Max: 300
Cube Metrics (string) : Set of metrics to be collectedChoices: cpu, disk, memory, network
Parallel Molecule Similarity Calculation
The parallel version adds these extra parameters.
Number of messages to distribute at a time (integer) : The maximum number of messages to bundle together for a parallel cube.Default: 1 Min: 1 Max: 65535
Maximum Failures (integer) : The maximum number of times to attempt processing a work itemDefault: 10 Min: 1 Max: 100
Autoscale this Cube (boolean) : If True, let Orion manage the parallelism of this CubeDefault: True
Maximum number of Cubes (integer) : The maximum number of concurrently running copies of this CubeDefault: 1000 Min: 1
Minimum number of Cubes (integer) : The minimum number of concurrently running copies of this CubeDefault: 0
Tip
filename: snowball/graphsim/sim2d_mol.py