How-to: Use LIME Explainer to Understand and Analyze Machine Learning Predictions

OpenEye Machine Learning builds machine learning models that predict physical properties of small molecules. The predictions are explained using another local model by annotating atoms, ligands, or pharmacophores responsible for the property exhibited by the molecule.

Explaining Predictions Using LIME

Despite providing the best solutions for several domains, from natural language processing to autonomous vehicles, the applications of neural networks in cheminformatics and small molecules is largely exploratory. One reason is that chemists are often skeptical about the predictions of a black box system. As a result, a demand for explainable machine learning systems has grown over the years, and OpenEye intends to address the need in this toolkit.

We use a methodology called LIME (Local Interpretable Model agnostic Explanations), which explains machine learning predictions by learning a local interpretable model regarding the prediction. In our system, LIME identifies the important bits in a fingerprint that are responsible for a certain prediction. These bits are then mapped to individual parts of the molecules by finding the “core” atom using the bond degree of the query molecule.

We look for the smallest pattern and only move to the larger “superset” patterns if there is a tie.

Shown below is an annotated image of a molecule with the bits that our algorithm thinks is important for the predicted property (Solubility). We translated bit importance to ligand or atom importance and annotated them based on a color scale. We can view (a) ligand annotated, (b)atom annotated, and (c) ligand+atom annotated explainer image as shown in the picture below.

../../../../../_images/explainer_n0.png

Fragments such as Amide bonds and hydroxyl groups are considered more soluble than the hydrophobic(greasy) benzene or nitrile groups. Blue represents hydrophobic areas, red represents hydrophilic areas, and colors in between fall somewhere in between. The models should be trained on different sets of fingerprints to see which explainer makes sense for the model. The color schema can be tweaked using the QQAR option under “Parallel Find Property and Explanation Cube”. QQAR indicates the quantile distributions of the LIME votes based on which the default color stops are defined. It allows the user to put color stops on the color bar. By default, it is derived from the quantile of LIME vote distribution to give an even color scale, but that can be changed to a more visually appealing scale if desired.

Library Details of the Floe