3D QSAR Model: Builder

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Models

Description

The 3D QSAR Model: Builder Floe is a tool for building models with 3D descriptors in a structure-based setting. The floe incorporates: (1) optional 3D conformer generation and charge assignment; (2) hyperparameter optimization for ROCS- and EON-based kernel-PLS model building; (3) cross-validation; (4) model building; and (5) optional external validation.

A set of individual 3D models are available for the user to choose from: ROCS-kPLS, EON-kPLS, ROCS-GPR, EON-GPR, ROCS-GPR-NO-2D, and EON-GPR-NO-2D. The floe builds the first four under default setting, along with a 2D-GPR model as the baseline model. Prediction from the consensus/COMBO model is taken as weighted average from all models built with their respective prediction confidence as weight. By default, the 2D-GPR model is included. User can choose to exclude 2D-GPR by turning Include 2D in COMBO OFF.

Building models using this floe requires a dataset of molecules along with their potency with an optional tagged external validation set. The user can choose to use pre-aligned input 3D conformer for a molecule as is by turning Use Input 3D ON. In the case where a molecule is 2D, the floe will attempt to generate 3D confomer for it using: 1. POSIT if reference Design Unit(s) is provided; 2. FlexiROCS if reference molecule(s) is provided. A recommended way to obtain pre-aligned reference molecules is to extract the bound ligands from Design Units prepared by Spruce.

Outputs from this floe contain: (1) a model dataset; (2) a training set conformer dataset used for model building (optional); and (3) an external validation dataset (optional). The output model dataset stores the reference Design Unit(s) or molecules provided, which will be read in the 3D QSAR Model: Validation and 3D QSAR Model: Predictor Floes.

The floe also produces hyperparameters optimization, cross-validation, and (optionally) the external validation reports.

Note: If the size of the training set is relatively large (e.g., greater than 300), consider increasing cube memory under “Cube Memory Parameters” section to avoid cube memory error. An eightfold increase is usually sufficient for a training set up to 1000 molecules.

Promoted Parameters

Title in user interface (promoted name)

Cross Validation Parameters

Split Method (split_method): Way to split the dataset into training and validation set

  • Type: string

  • Default: random

  • Choices: [‘random’, ‘leave one out’]

Percentage (Random Split) (percentage): The percentage of records used for training in random split

  • Type: decimal

  • Default: 90.0

Number of Split Sets (Random Split) (num_random_set): Number of times the random split to perform

  • Type: integer

  • Default: 50

External Validation Parameters

Do External Validation (do_ext_valid): Whether to do external validation. If true, floe will look for specified tag field with specified tag value to identify external validation set.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

External Validation Tag Field (in_test_tag_field): Field containing tag for external validation set

  • Type: field_parameter::int

  • Default: External validation tag

External Validation Set Tag Value (test_tag_value): Value of tag field for external validation set

  • Type: integer

  • Default: 1

Inputs

Ligand Database (in): Dataset containing the ligand molecules to process.

  • Required

  • Type: data_source

Receptors/Reference Molecules (receptors): Dataset containing pre-aligned receptors/reference molecules.

  • Type: data_source

Outputs

Output Model Dataset (out): Output dataset containing built models and receptors/reference molecules.

  • Required

  • Type: dataset_out

  • Default: Output for 3D QSAR Model: Builder

Failed Dataset (failed): Output dataset of failed calculations.

  • Required

  • Type: dataset_out

  • Default: Failed Output for 3D QSAR Model: Builder

Training Conformer Output Dataset (train_pose_out): Optional output dataset containing training set conformers if Output Training Conformers is On.

  • Required

  • Type: dataset_out

  • Default: Training Conformer Output

External Validation Output Dataset (ext_valid_out): Optional output dataset containing external validation results if Do External Validation is On.

  • Required

  • Type: dataset_out

  • Default: External Validation Output

Model Parameters

Selected 3D Models (model_3D): Selected 3D models to build.

  • Type: string

  • Default: [‘ROCS-GPR’, ‘EON-GPR’, ‘ROCS-KPLS’, ‘EON-KPLS’]

  • Choices: [‘ROCS-GPR’, ‘EON-GPR’, ‘ROCS-KPLS’, ‘EON-KPLS’, ‘ROCS-GPR-NO-2D’, ‘EON-GPR-NO-2D’]

Include 2D in COMBO (include_2D_in_COMBO): Whether to include 2D in final model prediction. Selected value carries on to predictor and validation floes.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

3D Conformer Parameters

Use Input 3D (use_input_3d): Whether to use 3D input structures. Flag will be ignored for molecules without 3D input structures.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Minimum Posit Probability (min_prob): The minimum POSIT probability for a valid training conformer. Ignored if Posit Probability field does not exist in record.

  • Type: decimal

  • Default: 0.5

Output Training Conformers (output_train_pose): Whether to output training set conformers used for model building.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Charge Method Parameters

Use Input Charges (use_input_charges): Use input charges.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Charge type (method_type): Charge assignment method.

  • Type: string

  • Default: am1bcc

  • Choices: [‘am1bcc’, ‘mmff’]

Potency Parameters

Input Potency field (in_potency_field): Field containing input potency data

  • Required

  • Type: field_parameter::float

  • Default: potency

Unit for Potency (potency_unit): Unit for input potency field (e.g. nanomolar or micromolar for IC50, log for pIC50, kcal/mol for binding free energy)

  • Type: string

  • Default: log

  • Choices: [‘micromolar’, ‘nanomolar’, ‘log’, ‘kcal/mol’, ‘kJ/mol’]

Minimum Potency (potency_min): Molecules with potency (log unit) at or below this value are not trustworthy and discarded.

  • Type: decimal

  • Default: 0.0

Maximum Potency (potency_max): Molecules with potency (log unit) at or above this value are not trustworthy and discarded.

  • Type: decimal

  • Default: 15.0

Cube Memory Parameters

Cube Memory (memory): Minimum amount of memory in MiBs (1048576 B).

  • Type: decimal

  • Default: 1800