Fingerprint Search - Small Scale 2D Similarity¶

Fingerprint Search - Small Scale 2D Similarity is a tool for finding similarity between an input dataset of molecules to a query or template molecule, based on molecular fingerprints.

The minimal inputs into 2D Similarity are a query molecule and a search database of molecules both in either 1D (SMILES), 2D (SD, mol2) or 3D format.

The output from the 2D Similarity floe is a hitlist with highly similar molecules at the top.

Extra Required Parameters

Output Dataset (dataset_out) : Output dataset of successful calculations

Default: Output for Fingerprint Search - Small Scale 2D Similarity

Size cutoff (integer) : Used for performance optimization. Below this size, query molecules are passed to the similarity cube initialization port.

Default: 10000

Database Molecules (data_source) : Dataset containing one or more molecules to compare against query

Added Boolean Field (Field Type: Bool) : The added boolean field.

Default: Added Boolean

Failed Dataset (dataset_out) : Output dataset of failed calculations

Default: Failed Output for Fingerprint Search - Small Scale 2D Similarity

Size cutoff (integer) : Used for performance optimization. Below this size, query molecules are passed to the similarity cube initialization port.

Default: 10000

Num Best Hits (integer) : Number of best-scoring molecules to keep

Default: 500 Min: 1 Max: 20000

Float Sort Field (Field Type: Float) : Record field containing the key value to sort by

Query Molecule Title Field (Field Type: String) : The title of the query molecule used to obtain the score.

Default: Query Molecule Title Field

Similarity Score Field (Field Type: Float) : Name for the field that stores fingerprint similarity scores.

Default: Similarity Score

Similarity Score Field (Field Type: Float) : Name for the field that stores fingerprint similarity scores.

Default: Similarity Score

Deduplicate Results (boolean) : If set to True, if multiple input molecules are the same, only retain similarity resultfor the query molecule with the highest score

Default: True

Query Molecule (data_source) : Dataset containing single molecule to use as query