Fingerprint Similarity Calculation¶
This cube calculates similarity scores between a query molecule and pre-generated fingerprints.
The fingerprints are read from the intake port, from the field specified by the Fingerprint Field parameter. The query molecule is read from the init initialization port, from the field specified by the Query Field parameter.
The type of the fingerprint generated for the query molecule is determined by the Fingerprint Type parameter. This fingerprint type has to be identical to the type of the fingerprints stored in the field specified by Fingerprint Field.
The similarity measure that is used to calculate the score is determined by the Similarity Measure parameter. The calculated score is stored in the field specified by Similarity Score Field and the record is sent to the success port.
Upstream Cubes
See also
Fingerprint Generation and Similarity Measures sections in GraphSim TK manual.
Calculation Parameters¶
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Fingerprint Type (fingerprint_type) type: string: The fingerprint type generated for similarity calculation.Default: TreeChoices: Circular, Lingo, MACCS, Path, Tree GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Similarity Measure (similarity_type) type: string: The similarity measure used to 2D similarity calculation.Default: TanimotoChoices: Cosine, Dice, Euclid, Manhattan, Tanimoto, Tversky Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Field parameters¶
Fingerprint Field (fingerprint_field) type: Field Type: Chem.FingerPrint: Tag name for the field that stores fingerprints. Query Field (init_mol_field) type: Field Type: Chem.Mol: The name of the field on the initialization record that stores the query molecule. If left blank, the primary molecule field will be used. Similarity Score Field (score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.
2D Similarity Parameters¶
- The parameters of the 2D fingerprint similarity calculation.
- Fingerprint Type (None) type: string: The fingerprint type generated for similarity calculation.Default: TreeChoices: Circular, Lingo, MACCS, Path, Tree
- Similarity Measure (None) type: string: The similarity measure used to 2D similarity calculation.Default: TanimotoChoices: Cosine, Dice, Euclid, Manhattan, Tanimoto, Tversky
Hardware Parameters¶
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
- Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network
Parallel Fingerprint Similarity Calculation
The parallel version adds these extra parameters.
Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.Default: 1 , Min: 1, Max: 65535 Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work itemDefault: 10 , Min: 1, Max: 100 Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this CubeDefault: True Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this CubeDefault: 1000 , Min: 1 Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this CubeDefault: 0