Machine Learning Model Building Documentation¶
Introduction and Tutorials¶
- Tutorial: Building Machine Learning Regression Models on Fingerprints for Physical Property Prediction of Small Molecules
- Tutorial: Use Pretrained Regression Fingerprint Model to Predict Property of Molecules
- Tutorial: Predict Solubility of Druglike Molecules
- Tutorial: Cheaper and Faster Version of Building Machine Learning Regression Fingerprint Models for Physical Property Prediction of Small Molecules
- Tutorial: Building Machine Learning Classifier Fingerprint Models for Physical Property Prediction of Small Molecules
- Tutorial: Use Pretrained Classification Fingerprint Model to Predict Generic Property of Molecules
- Tutorial: Building Machine Learning Regression Models on Feature Vector Input
- Tutorial: Use Custom Feature Input to predict regression properties
- Tutorial: Use Small Molecule Data Processing Floe to Preprocess ML Data
How-To Guides¶
Floe Reference Documentation¶
- ML Regression Model Building using Fingerprints for Small Molecules
- ML Regression using Fingerprints for Small Molecules
- Solubility Prediction for Small Molecule using Machine Learning and Cheminfo Fingerprints
- ML Classifier Model Building using Fingerprints for Small Molecules
- ML Classification using Fingerprints for Small Molecules
- ML Regression Model Building using Feature Input
- ML Regression using Feature Input
- Small Molecule Data Processing for ML Model Building
FAQs¶
- Frequently Asked Questions
- For model building floes, How do we compare multiple models to decide which one is the best?
- How good is your solubility model?
- Does the model use/ train on 3D features?
- The inputs are still based on expert parameters such as fingerprints, which are all biased by what the rules are defined by the expert user. Any insights into how to overcome this flaw?
- Are the predictions for crystalline solubility? Is the data for crystals?
- What made you prefer NNs instead of, say, XGBoost?
- Neural networks don’t always show good performance in low data regimes. What measures do you take to improve performance?
- How the confidences are computed?
- What methods do you find most effective to eliminate overfitting of your models?
- How many percentages can the solubility prediction have high/med confidences?
- In the model builder floe, am I restricted to fingerprint or can I add other descriptors, such as mol properties,or other calculated or experimental parameters?