Introduction

The OpenEye, Cadence Molecular Sciences Machine Learning Model Building Floes Package builds machine learning (ML) models that predict the properties of small molecules.

Arthur Samuel, an early pioneer in artificial intelligence, defined machine learning as “the ability of computers to learn without being explicitly programmed.” Machine learning develops algorithms and statistical models to allow computers to learn from input data and improve performance through experience with that data. There are three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. OpenEye algorithms fall under the category of supervised learning.

The applications of neural networks in cheminformatics and small molecules are largely exploratory, in part because chemists have been skeptical about the predictions of an ambiguous black box system. With the rapid growth of artificial intelligence, a demand for straightforward and explainable machine learning systems that address property prediction of small molecules has grown over the years, and OpenEye addresses that need in this package.

OpenEye, Cadence Molecular Sciences is a software company grounded firmly in science-based computations for small molecule drug discovery. We want to use machine learning as a tool to further this mission. But we are skeptics who are steadfast about science, not hype. To quote George Box: “All models are wrong, but some are useful.” Therefore, we try to ensure that the ML models built using our package include the following features:

No Black Box Model
- We use explainers to decode what a model predicts.
- We help chemists and data scientists debug the trained models.
Confidence and Domain of Applicability Awareness
- We use techniques such as the convex box approach and Monte Carlo dropout to give reliable confidence intervals.
Scalability for General Use Cases
- We use neural networks for our models; the advantage is that the size of the model extends from a simple perceptron to a deep neural network, hence making it flexible enough to tackle any datasize.