Generate 2D Similarity Matrix

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Description

This floe outputs a basic distribution for NxN 2D similarity scores calculated in parallel. It also can optionally write the similarity matrix to a numpy .npy binary file. It also can optionally write the similarity matrix to a numpy 2D array in a .npy binary file and a corresponding numpy .npy file with a 1D array of SMILES that labels the molecules for each row of the 2D array.

If the Use Pregenerated Fingerprints option is set to True, the floe will use the specified Fingerprint field for similarity. Otherwise, the floe will generate fingerprints of the type specified and use those for the similarity calculation.

Please note that for large input sizes, writing the matrix can require a large amount of memory. Please adjust the Advanced: Matrix File Writer Memory parameter for large input sizes.

Promoted Parameters

Title in user interface (promoted name)

Fingerprint Generation

Use Pregenerated fingerprints (switch): If set to True, the floe will not generate fingerprints, and instead use the fingerprint field specified to provide pregenerated fingerprints for each molecule.

Required

Type: boolean

Default: False

Choices: [True, False]

Fingerprint Field (fingerprint_field): If fingerprints are generated within the Floe, this is the name of the fingerprint field that will contain the generated fingerprints. If fingerprints are pregenerated, this should be the field name containing the pregenerated fingerprints.

Required

Type: field_parameter

Default: Fingerprint

Fingerprint Type (fingerprint_type): If Use Pregenerated is set to False, the type of fingerprint to be generated and used in the similarity calculation.

Type: string

Default: Circular

Choices: [‘Circular’, ‘Lingo’, ‘MACCS’, ‘Path’, ‘Tree’]

2D Similarity Calculation

2D Similarity Score Function (sim_type): The similarity measure used to 2D similarity calculation.

Type: string

Default: OETanimoto

Choices: [‘OECosine’, ‘OEDice’, ‘OEEuclid’, ‘OEManhattan’, ‘OETanimoto’]

Similarity Score Cutoff (sim_cutoff): Similarity scores below this value will be calculated as 0

Type: decimal

Default: 0.05

Outputs

Write Matrix To File (write_switch): Set to True, in order to write the similarity matrix to a file. WARNING: setting this to True will cause the parallel to run significantly more slowly, and memory on the Matrix File Writer cube may need to be increased for matrix sizes over 10,000 x 10,000.

Type: boolean

Default: False

Choices: [True, False]

Floe Report Name (floe_report_name): Name of report containing summary statistics.

Type: string

Default: 2D Similarity Score Report

Matrix File Name (similarity_matrix_filename): .npy file extension is required. This will be the numpy binary file containing the full similarity matrix as a 2D ndarray.

Type: string

Default: 2D_similarity_matrix.npy

Matrix SMILES row labels (row_label_filename): .npy file extension is required. This will be the numpy binary file containing SMILES labels for each row of the similarity matrix, as a 1D numpy ndarray.

Type: string

Default: 2D_similarity_matrix_SMILES_row_labels.npy

Output Text File (write_text): If set to True, in addition to the Write Matrix To File switch above, the floe will output text files for row labels and matrix, in addition to the binary .npy files that are generated.

Type: boolean

Default: False

Choices: [True, False]

Advanced: Matrix File Writer Memory (memory_mb): For large datasets, increase the memory available to the matrix writer cube.

Type: decimal

Default: 22000

Use Distance Matrix (use_distance): If True, distance, as (1.0 - similarity), will be output instead of similarity.

Type: boolean

Default: False

Choices: [True, False]