Gigadock Warp

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Product-based/Gigadock

Role-based/Computational Chemist

Solution-based/Virtual-screening/DB Search/Gigadock

Task-based/Virtual Screening - Structure-Based

Description

Approximates a full Gigadock run with a mixture of FastROCS and docking.

Docks a random subset of molecules
Runs FastROCS on all input molecules using top scoring poses from the previous step as queries
Creates a feature vector for each molecule with the FastROCS Shape and Color tanimotos from the prior step, the bits of a 4K Tree fingerprint and several basic 2D properties.
Create a regression model of the score based on the molecule docked in the first step and the feature vector. The model will be a neural net model if the number of molecules being docked is greater than 100M and a linear if the number of molecule being docked is less than 100M and greater than 1M (the floe cannot dock fewer than 1M molecules, using Gigadock floe in these cases).
Predict the score of the un-docked molecules with the regression model.
Dock the molecules the regression model predicts to have the best scores.
Output Hit List of top scoring docked molecules.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.

Type: data_source

Input Conformer Collection (input_conformer_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.

Required

Type: collection_source

Outputs

Hit List Dataset (hit_list_dataset): Output dataset with the top scoring docked molecules.

Required

Type: dataset_out

Default: Gigadock Warp Hit List

FastROCS Query Poses Dataset (fastrocs_query_poses_dataset): Output dataset with the queries used by FastROCS. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.

Required

Type: dataset_out

Default: Gigadock Warp FastROCS Queries

Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.

Required

Type: dataset_out

Default: Gigadock Warp Design Unit

Gigadock Warp Temporary Collection (gigadock_warp_temporary_collection): Name of the collection to create.

Required

Type: collection_sink

Default: Temp Collection

Model(s) Dataset (models_dataset): Output dataset to which to write.

Required

Type: dataset_out

Default: Models

Options

Hit List Size (hit_list_size): Size of the final hit list with the top scoring docked molecules.

Required

Type: integer

Default: 10000

Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.

Type: string

Default: Fred

Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]

Options: Model Featurization

Graphsim Fingerprint Features (graphsim_fingerprint_features): If specified the bits for the specified fingerprint type will be added to the feature vector. All fingerprint have 4,096 features, except for MACCS166 which has 166.

Type: string

Default: Tree

Choices: [‘Circular’, ‘Path’, ‘Tree’, ‘MACCS166’]

2D Property Features (two_d_property_features): 2D properties to add to the feature vector. Both the properties, the square of the properties and all cross terms for the selected properties will be added as to the feature vector.

Type: string

Default: [‘Molecular Weight’, ‘2D Polar Surface Area’, ‘XLogP’, ‘Number of Acceptors’, ‘Number of Donors’, ‘Number of Hydrogen Atoms’, ‘Number of Heavy Atoms’, ‘Number of Carbon Atoms’, ‘Number of Nitrogen Atoms’, ‘Number of Oxygen Atoms’, ‘Number of Fluorine Atoms’, ‘Number of Phosphorous Atoms’, ‘Number of Sulphur Atoms’, ‘Number of Chlorine Atoms’, ‘Number of Bromine Atoms’, ‘Number of Iodine Atoms’, ‘Number of Rotatable Bonds’, ‘Number of Bonds’, ‘Number of Heavy Bonds’, ‘Number of Single Bonds’, ‘Number of Heavy Single Bonds’, ‘Number of Double Bonds’, ‘Number of Triple Bonds’, ‘Number of Aromatic Bonds’, ‘Number of Rings’, ‘Number of Aromatic Rings’]

Choices: [‘Molecular Weight’, ‘2D Polar Surface Area’, ‘XLogP’, ‘Number of Acceptors’, ‘Number of Donors’, ‘Number of Hydrogen Atoms’, ‘Number of Heavy Atoms’, ‘Number of Carbon Atoms’, ‘Number of Nitrogen Atoms’, ‘Number of Oxygen Atoms’, ‘Number of Fluorine Atoms’, ‘Number of Phosphorous Atoms’, ‘Number of Sulphur Atoms’, ‘Number of Chlorine Atoms’, ‘Number of Bromine Atoms’, ‘Number of Iodine Atoms’, ‘Number of Rotatable Bonds’, ‘Number of Bonds’, ‘Number of Heavy Bonds’, ‘Number of Single Bonds’, ‘Number of Heavy Single Bonds’, ‘Number of Double Bonds’, ‘Number of Triple Bonds’, ‘Number of Aromatic Bonds’, ‘Number of Rings’, ‘Number of Aromatic Rings’]

Number of FastROCS Feature Query Poses (number_of_fastrocs_feature_query_poses): Number of top scoring poses from the training set to use to create fastROCS features. Each molecule will be overlayed onto each top scoring pose and the shape and color tanimoto added to the feature vector. Thus the number of FastROCS feature values will be twice the number of poses selected here since each pose has two values (shape and color tanimoto).

Required

Type: integer

Default: 100

FastROCS Feature Mode (fastrocs_feature_mode): Scoring mode to use.

Type: string

Default: shape tanimoto and color tanimoto

Choices: [‘tanimoto combo’, ‘highest tanimoto combo’, ‘shape tanimoto’, ‘shape tanimoto and color tanimoto’]

Verbose FastROCS (verbose_fastrocs): If ‘On’ timing information for the FastROCS feature calculation will be written to the log.

Type: boolean

Default: False

Choices: [True, False]

Number of Graphsim Tanimoto Feature Query Poses (number_of_graphsim_tanimoto_feature_query_poses): Number of top scoring poses from the training set to use to create graphsim tanimoto features. These features will be the tanimoto of the molecule being docked to each of the top scoring poses from the training set. The type of fingerprint uses to create the tanimoto is determined by the ‘Graphsim Tanimoto Feature Fingerprint Type’ parameter.

Required

Type: integer

Default: 0

Graphsim Tanimoto Feature Fingerprint Type (graphsim_tanimoto_feature_fingerprint_type): Type type of tanimoto to calculate for the Graphsim Tanimoto Features. This parameter is ignored if ‘Number of Graphsim Tanimoto Feature Query Poses’ is set to 0.

Type: string

Default: Tree

Choices: [‘Circular’, ‘Path’, ‘Tree’]

Options: Model Training

Number of Training Stages (number_of_training_stages): Number of training stages to run. The first stage creates a training set by docking a random subset of molecules and then creates a model. The second stage runs the model on the molecules from the first stage that were not selected for training, selects the top molecules by the predicted score, docks those selected molecules, adds those new docked molecules to the training data from the previous stage and creates a new updated model. Subsequent training stages work analogously to the second stage except.

Required

Type: integer

Default: 1

Fraction Train (fraction_train): The fraction of the input molecules that will be docked to create the training data for the score regression model. Increasing this value will increase the cost of the floe and decrease the minimum number of input molecules the floe requires to run (at the default value of 0.01 the minimum is 934600). Legal values for this parameter are between 0.1 and 0.001.

Required

Type: decimal

Default: 0.01

Final Dock Fraction (final_dock_fraction): The number of top scoring molecules from FastROCS that are passed to the final docking step is equal to this fraction of the size of the input collection(s). Increasing this value will increase the cost of the floe. When docking fewer than ~100M molecules it is recommended that this value be test to 0.08. The legal values for this parameter are between 0.01 and 0.1.

Required

Type: decimal

Default: 0.04

Batch Size (batch_size): Size of the training data, in number of molecules, that will be loaded onto the GPU at one time. If this value is set to zero the batch size will be dynamically assigned based on the amount of GPU memory.

Type: integer

Default: 0

Target Fraction GPU Memory (target_fraction_gpu_memory): The target fraction of GPU memory to allocation to a single batch of training data. Should be less than the available memory as the optimizer also needs to fit into memory. This parameter is only used when ‘Batch Size’ is 0 (i.e., dynamic setting of batch_size).

Type: decimal

Default: 0.75

Verbose Training (verbose_training): If true the training cube will write details of the training to the floe log.

Type: boolean

Default: True

Choices: [True, False]

Cycle Time for the training cube to avoid 12h limit (cycle_time_for_the_training_cube_to_avoid_12h_limit): This cube maintains an estimate of the total time the cube will consume after the next epoch calculation. If that estimated time exceeds this value in hours the cube will save the state of the current calculation and emit it to the ‘partial’ output port. The partial output can then be cycled back into the ‘intake’ port of this cube to continue the calculation. The purpose of this mechanism is to avoid the 12h limit. For parallel cubes this value should be set to a little less than 12h. For serial cube is can be set arbitrarily high as they are not subject to the 12h limit.

Required

Type: decimal

Default: 3650.0

Collapse Small Training Batches (collapse_small_training_batches): If True when the entire set of training data can fit in the GPU memory then all steps will be run in a single batch, rather than in separate batches.

Type: boolean

Default: True

Choices: [True, False]

Options : Linear Model Configuration

Enable Linear Model (enable_linear_model): If One everything sent to the intake port will be sent to the true port, otherwise everything will be send to the false port.

Required

Type: boolean

Default: True

Choices: [True, False]

Linear Epochs (linear_epochs): Minimum number of epochs the linear optimization will run for. This number of epochs may be exceeded in order to satisfy the number of batches required.

Type: integer

Default: 2

Linear Batches (linear_batches): Minimum number of batches the linear optimization will run for. This number of batches may be exceeded in order to satisfy the number of epochs required.

Type: integer

Default: 25

Linear Steps (linear_steps): Number of steps the linear optimization will run for. The floe will first figure out how many batches need to be run and then divide the total number of steps among them. How the steps are divided among the batches depends on the setting of the ‘Linear Step Scaling Power; parameters

Type: integer

Default: 20000

Linear Step Scaling Power (linear_step_scaling_power): This parameter controls how the number of steps specified by ‘Linear Steps’ are divided among the optimization batches. If set to 0 the steps will be evenly divided among the batches. If set to a higher more steps will be run in the earlier batches. The higher the value the more the steps will be distributed towards the earlier batches.

Type: decimal

Default: 2

Linear Optimizer (linear_optimizer): Method used to optimize the linear model

Required

Type: string

Default: Adam

Choices: [‘Adam’, ‘SGD’, ‘LBFGS’]

Linear Loss Function (linear_loss_function): Loss function to use in the optimization

Required

Type: string

Default: MSELoss

Choices: [‘MSELoss’]

Linear Neural Net Nodes (linear_neural_net_nodes): The number of nodes in the intermediate linear layer. This parameter can be specified multiple times to increase the number of layers. If this parameter is not specified linear regression will be configured (there will be no activation function in this circumstance).

Type: integer

Linear Neural Net Activation Function (linear_neural_net_activation_function): Type of activation function to use between all linear layers. This parameter is ignored if no values are supplied to the Linear Layer Nodes parameter.

Type: string

Default: ReLU

Choices: [‘ReLU’, ‘ReLU6’, ‘Sigmoid’, ‘ELU’, ‘LeakyReLU (neg slope 0.01)’, ‘LeakyReLU (neg slope 0.1)’]

Linear Adam Optimizer Loss Fxn (linear_adam_optimizer_loss_fxn): Learning rate for the Adam optimizer. This parameter is ignored in the optimizer is not set to Adam

Required

Type: decimal

Default: 0.001

Options : Neural Net Configuration

Enable Neural Net Model (enable_neural_net_model): If ‘On’ a neural net score regression model(s) will be created. If ‘Off’ no neural net model will be created and the setting of the other parameters in this parameter group will be effectively ignored during the floe run.

Required

Type: boolean

Default: True

Choices: [True, False]

Neural Net Nodes (neural_net_nodes): Number of nodes in each layer of the neural net. Enter multiple values to create a multi layered neural net.

Type: integer

Default: [100, 100]

Neural Net Activation Function (neural_net_activation_function): Activation function used between each layer of the neural net.

Type: string

Default: ReLU

Choices: [‘ReLU’, ‘ReLU6’, ‘Sigmoid’, ‘ELU’, ‘LeakyReLU (neg slope 0.01)’, ‘LeakyReLU (neg slope 0.1)’]

Number of Neural Net Models (number_of_neural_net_models): The number of neural net models to create. Each model will be optimized from a random start. Training cost increased linearly with the number of models.

Type: integer

Default: 1

Neural Net Epochs (neural_net_epochs): Minimum number of epochs the neural net optimization will run for. This number of epochs may be exceeded in order to satisfy the number of batches required (see ‘Neural Net Batches’ parameter).

Type: integer

Default: 2

Neural Net Batches (neural_net_batches): Minimum number of batches the neural net optimization will run for. This number of epochs may be exceeded in order to satisfy the number of epochs required (see ‘Neural Net Epochs’ parameter).

Type: integer

Default: 25

Neural Net Steps (neural_net_steps): Number of steps the neural net optimization will run for. The floe will first figure out how many batches need to be run and then divide the total number of steps among them. How the steps are divided among the batches depends on the setting of the ‘Neural Net Step Scaling Power; parameters

Type: integer

Default: 20000

Neural Net Step Scaling Power (neural_net_step_scaling_power): This parameter controls how the number of steps specified by ‘Neural Net Steps’ are divided among the optimization batches. If set to 0 the steps will be evenly divided among the batches. If set to a higher more steps will be run in the earlier batches. The higher the value the more the steps will be distributed towards the earlier batches.

Type: decimal

Default: 2

Neural Net Optimizer (neural_net_optimizer): Method used to optimize the linear model

Required

Type: string

Default: Adam

Choices: [‘Adam’, ‘SGD’, ‘LBFGS’]

Neural Net Loss Function (neural_net_loss_function): Loss function to use in the optimization

Required

Type: string

Default: MSELoss

Choices: [‘MSELoss’]

Neural Net Adam Optimizer Loss Fxn (neural_net_adam_optimizer_loss_fxn): Learning rate for the Adam optimizer. This parameter is ignored in the optimizer is not set to Adam

Required

Type: decimal

Default: 0.001

Options: Hardware

Training Instance Disk Space (training_instance_disk_space): Required Disk Space on the machine(s) that will do model training. If this value is set to low for the number of input molecules the floe will fail quick with an error indicating the required number of setting. Higher values may result in longer run times because there will be fewer AWS GPU instance with the required amount of disk space and these may be in short supply on AWS. The total required disk space can be reduced by reducing the fraction of molecules that will be used as training data (see the ‘Options: Model Training -> Fraction Train’ parameter).

Type: decimal

Default: 3355443.2

Training Instance Types (training_instance_types): Instance type for the training model. Note this this must be running a local SSD drive.

Type: string

Default: !cdns

GPU Count (gpu_count): Minimum required number of GPU on the AWS instance used to train the regression model(s)

Type: decimal

Default: 4

Choices: [1, 2, 4, 8]

Training Cube Shared Memory (MiB) (training_cube_shared_memory_mib): Amount of shared memory (MiB) for cube doing the model training.

Type: decimal

Default: 2048

Training Cube RAM (MiB) (training_cube_ram_mib): Amount of memory (MiB) for the cube doing the model training.

Type: decimal

Default: 28672

Training Cube Spot Policy (training_cube_spot_policy): Spot policy for the model training cube

Type: string

Default: Prohibited

Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]

Training Cube Except on Failure (training_cube_except_on_failure): If true the training cube will throw and exception if it fails to train or has no training data

Type: boolean

Default: True

Choices: [True, False]

FastROCS Instance Types (fastrocs_instance_types): Instance type used by FastROCS. If unspecified an instance type will be chosen automatically

Type: string

Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.

FastROCS Spot Policy (fastrocs_spot_policy): Control whether spot or non-spot instances will be used for FastROCS cubes. In general spot instances are cheaper than non-spot instances and using them will reduce the cost of the floe, however spot instances can be in short supply and thus using them may increase the run time of the floe. The settings of this parameter have the following meaning. Allowed: Use both spot and non spot instances. Required: Only spot instances will be used. Preferred: Floe will preferentially use spot instances, but non-spot will be used if spot instances are in short supply. NotPreferred: Floe will preferentially use non-spot instances, but spot instances will be used if non-spot instances are in short supply. Prohibited: Only non-spot instances will be used.

Type: string

Default: Preferred

Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]

FastROCS Instance CPU Count (fastrocs_instance_cpu_count): Minimum CPU count for the FastROCS instances.

Type: integer

Default: 4

Input Fields

Input Conformers Field (input_conformers_field): Field on the input collection that holds the conformers of the molecules to be docked. If unspecified the default primary molecule field will be used.

Type: field_parameter::mol

Output Fields

Docked Score Field (docked_score_field): Field on the output hit list and raw results collection that will contain the docked score

Required

Type: field_parameter::float

Default: Chemgauss4

Docked Pose Field (docked_pose_field): Field on the output hit list and raw results collection that will hold the docked pose. If unspecified the default primary mol field will be used.

Type: field_parameter::mol

Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

Type: field_parameter::float

Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

Type: field_parameter::float

Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

Type: field_parameter::float

Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

Type: field_parameter::float

Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

Type: field_parameter::float

Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

Type: field_parameter::float

Design Unit Field (design_unit_field): Field on the ‘Output Design Unit(s) Dataset’ that will contain a copy of the design unit(s).

Type: field_parameter

Default: Design Unit

Design Unit ID Field (design_unit_id_field): Field on the ‘Output Design Unit(s) Dataset’ with a unique (for this run) identifier of the design unit

Required

Type: field_parameter::int

Default: Design Unit ID

Design Unit Link Field (design_unit_link_field): Field on the ‘Output Design Unit(s) Dataset’ containing a link to the design unit

Required

Type: field_parameter::link

Default: Design Unit Link

Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.

Type: field_parameter::string

Default: Bemis Murcko SMILES

Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.

Type: field_parameter::int

Default: Bemis Murcko ID

Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)

Type: field_parameter::int

Default: Bemis Murcko Rank

Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.

Type: field_parameter::string

Default: Hetero Bemis Murcko

Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.

Type: field_parameter::int

Default: Hetero Bemis Murcko ID

Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)

Type: field_parameter::int

Default: Hetero Bemis Murcko Rank

Training AUC Field (training_auc_field): Field on the model record holding the AUC of the model on the training data

Required

Type: field_parameter::float

Default: Training AUC

Training AUC Raw Data Field (training_auc_raw_data_field): Field on the model record holding the raw AUC data of the model on the training data

Type: field_parameter

Default: Training AUC Raw Data

Training AUC Num Actives Field (training_auc_num_actives_field): Field on the model record holding the the number of actives in the training data AUC calculation

Type: field_parameter::int

Default: Training AUC Num Actives

Training AUC Num DecoysField (training_auc_num_decoysfield): Field on the model record holding the number of decoys in the training data AUC calculation

Type: field_parameter::int

Default: Training AUC Num Decoys

Development

Test Train Failure Recovery (test_train_failure_recovery): If True and exception will be thrown by the training cube after it finished each batch optimization. The cube machinery should recover from this by emitting the state after the batch optimization to a cycle. The purpose of this flag it to test this recovery machinery and it should never be set/adjusted by the end user.

Type: boolean

Default: False

Choices: [True, False]

V2 Temporary Collection (v2_temporary_collection):

Type: boolean

Default: False

Choices: [True, False]

Dock Mode (dock_mode): If Off this floe will run in analysis mode. It will expect the input collection to be a Raw Results collection from a Gigadock Run with the docked pose in the ‘Docked Pose’ field and the original conformers in the primary molecule field. No design unit should be given in analysis mode. In analysis mode the output will be a dataset with an AUC calculation.

Type: boolean

Default: True

Choices: [True, False]

AUC Dataset (auc_dataset): Output dataset to which to write.

Required

Type: dataset_out

Default: AUC

AUC Raw Data Field (auc_raw_data_field): Field on the output success records containing the raw AUC data as an float vector. The length of the vector is the number of actives plus one and the value of the vector is the number of inactive that score better than the given active. This field is not required. If unspecified the field will not be created on the output record, but the cube will still output the other expected fields

Type: field_parameter

Default: AUC Raw Data

Max Parallel for Docking (max_parallel_for_docking): The maximum number of concurrently running copies of this Cube

Type: integer

Default: 25000

Max Parallel for Non-Docking Cubes (max_parallel_for_non_docking_cubes): The maximum number of concurrently running copies of this Cube

Type: integer

Default: 2000

Disk space (MiB) for shard handling cubes (disk_space_mib_for_shard_handling_cubes): Disk space for shard handling cubes except the training cube (promoted independently).

Type: decimal

Default: 19456

Predicted Score Field (predicted_score_field): Field on the input records to be packed

Required

Type: field_parameter::float

Default: Predicted Score

Predict Score Batch Size (predict_score_batch_size): Number of molecules to accumulate before making a prediction on that set of molecules.

Required

Type: integer

Default: 1000

Fingerprint Field (fingerprint_field): Field on the records holds a graphsim fingerprint that will be part of the feature vector. This is stored as an OEFingerprint rather and added to the float feature vector when needed by the floe to save memory.

Type: field_parameter

Default: Graphsim Fingerprint

Feature Vector Field (feature_vector_field): Field on the records that will hold the feature vector.

Type: field_parameter

Default: Feature Vector

Target Shard Size (target_shard_size): The target number of records in a shard.

0 indicates to run up to the max_shard_bytes limit per shard

Required

Type: integer

Default: 100000

Dock shard Size (dock_shard_size): Total count on the input shards to accumulate before emitting a group of shards

Required

Type: integer

Default: 1000

Linear SGD Learning Rate (linear_sgd_learning_rate): Learning rate for the SGD optimizer. This parameter is ignored in the optimizer is not set to SGD

Required

Type: decimal

Default: 0.001

Linear SGD Momentum (linear_sgd_momentum): Momentum for the SGD optimizer. This parameter is ignored in the optimizer is not set to SGD

Required

Type: decimal

Default: 0.0

Linear SGD Dampening (linear_sgd_dampening): Dampening for the SGD optimizer. This parameter is ignored in the optimizer is not set to SGD

Required

Type: decimal

Default: 0.0

Linear LBFGS Learning Rate (linear_lbfgs_learning_rate): Learning rate for the LBFGS optimizer. This parameter is ignored in the optimizer is not set to LBFGS

Required

Type: decimal

Default: 1.0

Linear LBFGS History Size (linear_lbfgs_history_size): History size for the LBFGS optimizer. This parameter is ignored if the optimizer is not set to LBFGS

Required

Type: integer

Default: 100

Min timeout for shard uploading (min_timeout_for_shard_uploading): Sets the minimum retry timeout time in seconds.

Required

Type: integer

Default: 2

Max timeout for shard uploading (max_timeout_for_shard_uploading): Sets the maximum retry timeout time in seconds.

Required

Type: integer

Default: 1800.0

Sunk time scalar for shard uploading (sunk_time_scalar_for_shard_uploading): The retry timeout of an OrionSession operation will be set to the amount of time this item of work has been processing times this scalar value.

Required

Type: decimal

Default: 0.5

Max Shard Upload Attempts (max_shard_upload_attempts): Number of attempts to make when uploading a shard

Type: integer

Default: 2

Retry dictionary for shard uploads (retry_dictionary_for_shard_uploads): Entry must be of the form ‘<status code>:<number of retries>’. Both <status code> and <number of retries> must be integer values.

Type: string

Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]

Min timeout for shard downloading (min_timeout_for_shard_downloading): Sets the minimum retry timeout time in seconds.

Required

Type: integer

Default: 2.0

Max timeout for shard downloading (max_timeout_for_shard_downloading): Sets the maximum retry timeout time in seconds.

Required

Type: integer

Default: 1800.0

Sunk time scalar for shard downloading (sunk_time_scalar_for_shard_downloading): The retry timeout of an OrionSession operation will be set to the amount of time this item of work has been processing times this scalar value.

Required

Type: decimal

Default: 0.5

Max Shard Download Attempts (max_shard_download_attempts): Number of attempts to make when downloading a shard

Type: integer

Default: 1

Retry dictionary for shard downloading (retry_dictionary_for_shard_downloading): Entry must be of the form ‘<status code>:<number of retries>’. Both <status code> and <number of retries> must be integer values.

Type: string

Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]

FastROCS Prep Mol Field (fastrocs_prep_mol_field):

Type: field_parameter::mol

Default: __FastROCSPreparedMolecule__

Enable cube timing report (time_all_cubes): If true this cube will emit timing information to the timing_data port.

Type: boolean

Default: True

Choices: [True, False]

Catch exceptions (catch_exceptions): If Off exception handling will be disabled for this cube.

Type: boolean

Default: True

Choices: [True, False]

Catch exceptions (parallel_catch_exception_methods): Specifies which methods of a parallel cube an exception will be caught and emitted to the exception port if the port is connected. If the exception port is connected to an exception handler this will stop the floe

Type: string

Default: [‘begin’]

Choices: [‘begin’, ‘process’, ‘end’]

Allow All Models (allow_all_models): Development parameter that if ‘On’ disables filtering the models by the size of the input

Type: boolean

Default: True

Choices: [True, False]

Verbose Model Selection (verbose_model_selection): If ‘On’ model fitlering/selection will be written to the log

Type: boolean

Default: True

Choices: [True, False]