How to Use LIME Explainer to Understand and Analyze Machine Learning Predictions

As mentioned in the introduction to this package, we want to avoid having a “black box” model. Several methodologies are available to help explain machine learning models and to ensure that models make sense.

We use a methodology called LIME (Local Interpretable Model Agnostic Explanations), which explains machine learning predictions by learning a local, interpretable model regarding the prediction. In our system, LIME identifies the important bits in a fingerprint that are responsible for a certain prediction. These bits are then mapped to individual parts of the molecules by finding the “core” atom using the bond degree of the query molecule.

We look for the smallest pattern and only move to the larger “superset” patterns if there is a tie.

Figure 1 shows annotated images of a molecule with the bits that our algorithm thinks are important for the predicted property (solubility). We translated bit importance to ligand or atom importance and annotated them based on a color scale. We can then view

../../../../../_images/explainer_n0.png — **Figure 1.** Molecule interpreted using LIME, showing (a) annotated ligands, (b) annotated atoms, and (c) annotated ligand+atom explainer images.

Fragments such as amide bonds and hydroxyl groups are considered more soluble than the hydrophobic (greasy) benzene or nitrile groups. Blue represents hydrophobic areas, red represents hydrophilic areas, and colors in between fall in the range between. The models should be trained on different sets of fingerprints to see which explainer makes sense for the model. The color schema can be tweaked using the QQAR option under the Parallel Find Property and Explanation Cube. QQAR indicates the quantile distributions of the LIME votes, based on which default color stops are defined. It allows the user to put color stops on the color bar. By default, it is derived from the quantile of LIME vote distribution to give an even color scale, but that can be changed to a more visually appealing scale if desired.

Library Details of the Floe

F_nn built on Tensorflow Package.
Molecule explanation built on Lime.
Domain of application built on Tensorflow Probability.