3D QSAR Model: Builder
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Models
Description
The 3D QSAR Model: Builder Floe is a tool for building models with 3D descriptors in a structure-based setting. The floe incorporates: (1) optional 3D conformer generation and charge assignment; (2) hyperparameter optimization for ROCS- and EON-based kernel-PLS model building; (3) cross-validation; (4) model building; and (5) optional external validation.
A set of individual 3D models are available for the user to choose from: ROCS-kPLS, EON-kPLS, ROCS-GPR, EON-GPR, ROCS-GPR-NO-2D, and EON-GPR-NO-2D. The floe builds the first four under default setting, along with a 2D-GPR model as the baseline model. Prediction from the consensus/COMBO model is taken as weighted average from all models built with their respective prediction confidence as weight. By default, the 2D-GPR model is included. User can choose to exclude 2D-GPR by turning Include 2D in COMBO OFF.
Building models using this floe requires a dataset of molecules along with their potency with an optional tagged external validation set. The user can choose to use pre-aligned input 3D conformer for a molecule as is by turning Use Input 3D ON. In the case where a molecule is 2D, the floe will attempt to generate 3D confomer for it using: 1. POSIT if reference Design Unit(s) is provided; 2. FlexiROCS if reference molecule(s) is provided. A recommended way to obtain pre-aligned reference molecules is to extract the bound ligands from Design Units prepared by Spruce.
Outputs from this floe contain: (1) a model dataset; (2) a training set conformer dataset used for model building (optional); and (3) an external validation dataset (optional). The output model dataset stores the reference Design Unit(s) or molecules provided, which will be read in the 3D QSAR Model: Validation and 3D QSAR Model: Predictor Floes.
The floe also produces hyperparameters optimization, cross-validation, and (optionally) the external validation reports.
Note: If the size of the training set is relatively large (e.g., greater than 300), consider increasing cube memory under “Cube Memory Parameters” section to avoid cube memory error. An eightfold increase is usually sufficient for a training set up to 1000 molecules.
Promoted Parameters
Title in user interface (promoted name)
Cross Validation Parameters
Split Method (split_method): Way to split the dataset into training and validation set
Type: string
Default: random
Choices: [‘random’, ‘leave one out’]
Percentage (Random Split) (percentage): The percentage of records used for training in random split
Type: decimal
Default: 90.0
Number of Split Sets (Random Split) (num_random_set): Number of times the random split to perform
Type: integer
Default: 50
External Validation Parameters
Do External Validation (do_ext_valid): Whether to do external validation. If true, floe will look for specified tag field with specified tag value to identify external validation set.
Type: boolean
Default: False
Choices: [True, False]
External Validation Tag Field (in_test_tag_field): Field containing tag for external validation set
Type: field_parameter::int
Default: External validation tag
External Validation Set Tag Value (test_tag_value): Value of tag field for external validation set
Type: integer
Default: 1
Inputs
Ligand Database (in): Dataset containing the ligand molecules to process.
Required
Type: data_source
Receptors/Reference Molecules (receptors): Dataset containing pre-aligned receptors/reference molecules.
Type: data_source
Outputs
Output Model Dataset (out): Output dataset containing built models and receptors/reference molecules.
Required
Type: dataset_out
Default: Output for 3D QSAR Model: Builder
Failed Dataset (failed): Output dataset of failed calculations.
Required
Type: dataset_out
Default: Failed Output for 3D QSAR Model: Builder
Training Conformer Output Dataset (train_pose_out): Optional output dataset containing training set conformers if Output Training Conformers is On.
Required
Type: dataset_out
Default: Training Conformer Output
External Validation Output Dataset (ext_valid_out): Optional output dataset containing external validation results if Do External Validation is On.
Required
Type: dataset_out
Default: External Validation Output
Model Parameters
Selected 3D Models (model_3D): Selected 3D models to build.
Type: string
Default: [‘ROCS-GPR’, ‘EON-GPR’, ‘ROCS-KPLS’, ‘EON-KPLS’]
Choices: [‘ROCS-GPR’, ‘EON-GPR’, ‘ROCS-KPLS’, ‘EON-KPLS’, ‘ROCS-GPR-NO-2D’, ‘EON-GPR-NO-2D’]
Include 2D in COMBO (include_2D_in_COMBO): Whether to include 2D in final model prediction. Selected value carries on to predictor and validation floes.
Type: boolean
Default: True
Choices: [True, False]
3D Conformer Parameters
Use Input 3D (use_input_3d): Whether to use 3D input structures. Flag will be ignored for molecules without 3D input structures.
Type: boolean
Default: True
Choices: [True, False]
Minimum Posit Probability (min_prob): The minimum POSIT probability for a valid training conformer. Ignored if Posit Probability field does not exist in record.
Type: decimal
Default: 0.5
Output Training Conformers (output_train_pose): Whether to output training set conformers used for model building.
Required
Type: boolean
Default: False
Choices: [True, False]
Charge Method Parameters
Use Input Charges (use_input_charges): Use input charges.
Type: boolean
Default: False
Choices: [True, False]
Charge type (method_type): Charge assignment method.
Type: string
Default: am1bcc
Choices: [‘am1bcc’, ‘mmff’]
Potency Parameters
Input Potency field (in_potency_field): Field containing input potency data
Required
Type: field_parameter::float
Default: potency
Unit for Potency (potency_unit): Unit for input potency field (e.g. nanomolar or micromolar for IC50, log for pIC50, kcal/mol for binding free energy)
Type: string
Default: log
Choices: [‘micromolar’, ‘nanomolar’, ‘log’, ‘kcal/mol’, ‘kJ/mol’]
Minimum Potency (potency_min): Molecules with potency (log unit) at or below this value are not trustworthy and discarded.
Type: decimal
Default: 0.0
Maximum Potency (potency_max): Molecules with potency (log unit) at or above this value are not trustworthy and discarded.
Type: decimal
Default: 15.0
Cube Memory Parameters
Cube Memory (memory): Minimum amount of memory in MiBs (1048576 B).
Type: decimal
Default: 1800