Generate 2D Similarity Matrix

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Description

This floe outputs a basic distribution for NxN 2D similarity scores calculated in parallel. It also can optionally write the similarity matrix to a numpy .npy binary file. It also can optionally write the similarity matrix to a numpy 2D array in a .npy binary file and a corresponding numpy .npy file with a 1D array of SMILES that labels the molecules for each row of the 2D array.

If the Use Pregenerated Fingerprints option is set to True, the floe will use the specified Fingerprint field for similarity. Otherwise, the floe will generate fingerprints of the type specified and use those for the similarity calculation.

Please note that for large input sizes, writing the matrix can require a large amount of memory. Please adjust the Advanced: Matrix File Writer Memory parameter for large input sizes.

Promoted Parameters

Title in user interface (promoted name)

Fingerprint Generation

Use Pregenerated fingerprints (switch): If set to True, the floe will not generate fingerprints, and instead use the fingerprint field specified to provide pregenerated fingerprints for each molecule.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Fingerprint Field (fingerprint_field): If fingerprints are generated within the Floe, this is the name of the fingerprint field that will contain the generated fingerprints. If fingerprints are pregenerated, this should be the field name containing the pregenerated fingerprints.

  • Required

  • Type: field_parameter

  • Default: Fingerprint

Fingerprint Type (fingerprint_type): If Use Pregenerated is set to False, the type of fingerprint to be generated and used in the similarity calculation.

  • Type: string

  • Default: Circular

  • Choices: [‘Circular’, ‘Lingo’, ‘MACCS’, ‘Path’, ‘Tree’]

2D Similarity Calculation

2D Similarity Score Function (sim_type): The similarity measure used to 2D similarity calculation.

  • Type: string

  • Default: OETanimoto

  • Choices: [‘OECosine’, ‘OEDice’, ‘OEEuclid’, ‘OEManhattan’, ‘OETanimoto’]

Similarity Score Cutoff (sim_cutoff): Similarity scores below this value will be calculated as 0

  • Type: decimal

  • Default: 0.05

Outputs

Write Matrix To File (write_switch): Set to True, in order to write the similarity matrix to a file. WARNING: setting this to True will cause the parallel to run significantly more slowly, and memory on the Matrix File Writer cube may need to be increased for matrix sizes over 10,000 x 10,000.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Floe Report Name (floe_report_name): Name of report containing summary statistics.

  • Type: string

  • Default: 2D Similarity Score Report

Matrix File Name (similarity_matrix_filename): .npy file extension is required. This will be the numpy binary file containing the full similarity matrix as a 2D ndarray.

  • Type: string

  • Default: 2D_similarity_matrix.npy

Matrix SMILES row labels (row_label_filename): .npy file extension is required. This will be the numpy binary file containing SMILES labels for each row of the similarity matrix, as a 1D numpy ndarray.

  • Type: string

  • Default: 2D_similarity_matrix_SMILES_row_labels.npy

Output Text File (write_text): If set to True, in addition to the Write Matrix To File switch above, the floe will output text files for row labels and matrix, in addition to the binary .npy files that are generated.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Advanced: Matrix File Writer Memory (memory_mb): For large datasets, increase the memory available to the matrix writer cube.

  • Type: decimal

  • Default: 22000

Use Distance Matrix (use_distance): If True, distance, as (1.0 - similarity), will be output instead of similarity.

  • Type: boolean

  • Default: False

  • Choices: [True, False]