Generate 2D Similarity Matrix¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Description
This floe outputs a basic distribution for NxN 2D similarity scores calculated in parallel. It also can optionally write the similarity matrix to a numpy .npy binary file. It also can optionally write the similarity matrix to a numpy 2D array in a .npy binary file, and a corresponding numpy .npy file with a 1D array of SMILES that label the molecules for each row of the 2D array.
If the Use Pregenerated Fingerprints option is set to True the Floe will use the specified Fingerprint field for similarity. Otherwise, the floe will generate fingerprints of the type specified and use those for similarity calculation.
Please note that for large input sizes, writing the matrix can require a large amount of memory. Please adjust the Advanced: Matrix File Writer Memory parameter for large input sizes.
Promoted Parameters
Title in user interface (promoted name)
Fingerprint Generation
Use Pregenerated fingerprints (switch): If set to True, the floe will not generate fingerprints, and instead use the fingerprint field specified to provide pregenerated fingerprints for each molecule.
Required
Type: boolean
Default: False
Choices: [True, False]
Fingerprint Field (fingerprint_field): If fingerprints are generated within the Floe, this is the name of the fingerprint field that will contain the generated fingerprints. If fingerprints are pregenerated, this should be the field name containing the pregenerated fingerprints.
Required
Type: field_parameter
Default: Fingerprint
Fingerprint Type (fingerprint_type): If Use Pregenerated is set to False, The type of fingerprint to be generated and used in the similarity calculation.
Type: string
Default: Circular
Choices: [‘Circular’, ‘Lingo’, ‘MACCS’, ‘Path’, ‘Tree’]
2D Similarity Calculation
2D Similarity Score Function (sim_type): The similarity measure used to 2D similarity calculation.
Type: string
Default: OETanimoto
Choices: [‘OECosine’, ‘OEDice’, ‘OEEuclid’, ‘OEManhattan’, ‘OETanimoto’]
Similarity Score Cutoff (sim_cutoff): Similarity scores below this value will be calculated as 0
Type: decimal
Default: 0.05
Outputs
Write Matrix To File (write_switch): Set to True, in order to write the similarity matrix to a file. WARNING: setting this to True will cause the parallel to run significantly more slowly, and memory on the Matrix File Writer cube may need to be increased for matrix sizes over 10,000 x 10,000.
Type: boolean
Default: False
Choices: [True, False]
Floe Report Name (floe_report_name): Name of report containing summary statistics.
Type: string
Default: 2D Similarity Score Report
Matrix File Name (similarity_matrix_filename): .npy file extension is required. This will be the numpy binary file containing the full similarity matrix as a 2D ndarray.
Type: string
Default: 2D_similarity_matrix.npy
Matrix SMILES row labels (row_label_filename): .npy file extension is required. This will be the numpy binary file containing SMILES labels for each row of the similarity matrix, as a 1D numpy ndarray.
Type: string
Default: 2D_similarity_matrix_SMILES_row_labels.npy
Output Text File (write_text): If set to True, in addition to the Write Matrix To File switch above, the floe will output text files for row labels and matrix, in addition to the binary .npy files that are generated
Type: boolean
Default: False
Choices: [True, False]
Advanced: Matrix File Writer Memory (memory_mb): For large datasets, increase the memory available to the matrix writer cube.
Type: decimal
Default: 22000
Use Distance Matrix (use_distance): If True, distance, as (1.0 - similarity), will be output, instead of similarity.
Type: boolean
Default: False
Choices: [True, False]