Gigadock Warp
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/Gigadock
Role-based/Computational Chemist
Solution-based/Virtual-screening/DB Search/Gigadock
Task-based/Virtual Screening - Structure-Based
Description
Approximates a full Gigadock run with a mixture of FastROCS and docking.
Docks a random subset of molecules
Runs FastROCS on all input molecules using top scoring poses from the previous step as queries
Creates a feature vector for each molecule with the FastROCS Shape and Color tanimotos from the prior step, the bits of a 4K Tree fingerprint and several basic 2D properties.
Create a regression model of the score based on the molecule docked in the first step and the feature vector. The model will be a neural net model if the number of molecules being docked is greater than 100M and a linear if the number of molecule being docked is less than 100M and greater than 1M (the floe cannot dock fewer than 1M molecules, using Gigadock floe in these cases).
Predict the score of the un-docked molecules with the regression model.
Dock the molecules the regression model predicts to have the best scores.
Output Hit List of top scoring docked molecules.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.
Type: data_source
Input Conformer Collection (input_conformer_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.
Required
Type: collection_source
Outputs
Hit List Dataset (hit_list_dataset): Output dataset with the top scoring docked molecules.
Required
Type: dataset_out
Default: Gigadock Warp Hit List
FastROCS Query Poses Dataset (fastrocs_query_poses_dataset): Output dataset with the queries used by FastROCS. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.
Required
Type: dataset_out
Default: Gigadock Warp FastROCS Queries
Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.
Required
Type: dataset_out
Default: Gigadock Warp Design Unit
Gigadock Warp Temporary Collection (gigadock_warp_temporary_collection): Name of the collection to create.
Required
Type: collection_sink
Default: Temp Collection
Model(s) Dataset (models_dataset): Output dataset to which to write.
Required
Type: dataset_out
Default: Models
Options
Hit List Size (hit_list_size): Size of the final hit list with the top scoring docked molecules.
Required
Type: integer
Default: 10000
Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.
Type: string
Default: Fred
Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]
Options: Model Featurization
Graphsim Fingerprint Features (graphsim_fingerprint_features): If specified the bits for the specified fingerprint type will be added to the feature vector. All fingerprint have 4,096 features, except for MACCS166 which has 166.
Type: string
Default: Tree
Choices: [‘Circular’, ‘Path’, ‘Tree’, ‘MACCS166’]
2D Property Features (two_d_property_features): 2D properties to add to the feature vector. Both the properties, the square of the properties and all cross terms for the selected properties will be added as to the feature vector.
Type: string
Default: [‘Molecular Weight’, ‘2D Polar Surface Area’, ‘XLogP’, ‘Number of Acceptors’, ‘Number of Donors’, ‘Number of Hydrogen Atoms’, ‘Number of Heavy Atoms’, ‘Number of Carbon Atoms’, ‘Number of Nitrogen Atoms’, ‘Number of Oxygen Atoms’, ‘Number of Fluorine Atoms’, ‘Number of Phosphorous Atoms’, ‘Number of Sulphur Atoms’, ‘Number of Chlorine Atoms’, ‘Number of Bromine Atoms’, ‘Number of Iodine Atoms’, ‘Number of Rotatable Bonds’, ‘Number of Bonds’, ‘Number of Heavy Bonds’, ‘Number of Single Bonds’, ‘Number of Heavy Single Bonds’, ‘Number of Double Bonds’, ‘Number of Triple Bonds’, ‘Number of Aromatic Bonds’, ‘Number of Rings’, ‘Number of Aromatic Rings’]
Choices: [‘Molecular Weight’, ‘2D Polar Surface Area’, ‘XLogP’, ‘Number of Acceptors’, ‘Number of Donors’, ‘Number of Hydrogen Atoms’, ‘Number of Heavy Atoms’, ‘Number of Carbon Atoms’, ‘Number of Nitrogen Atoms’, ‘Number of Oxygen Atoms’, ‘Number of Fluorine Atoms’, ‘Number of Phosphorous Atoms’, ‘Number of Sulphur Atoms’, ‘Number of Chlorine Atoms’, ‘Number of Bromine Atoms’, ‘Number of Iodine Atoms’, ‘Number of Rotatable Bonds’, ‘Number of Bonds’, ‘Number of Heavy Bonds’, ‘Number of Single Bonds’, ‘Number of Heavy Single Bonds’, ‘Number of Double Bonds’, ‘Number of Triple Bonds’, ‘Number of Aromatic Bonds’, ‘Number of Rings’, ‘Number of Aromatic Rings’]
Number of FastROCS Feature Query Poses (number_of_fastrocs_feature_query_poses): Number of top scoring poses from the training set to use to create fastROCS features. Each molecule will be overlayed onto each top scoring pose and the shape and color tanimoto added to the feature vector. Thus the number of FastROCS feature values will be twice the number of poses selected here since each pose has two values (shape and color tanimoto).
Required
Type: integer
Default: 100
FastROCS Feature Mode (fastrocs_feature_mode): Scoring mode to use.
Type: string
Default: shape tanimoto and color tanimoto
Choices: [‘tanimoto combo’, ‘highest tanimoto combo’, ‘shape tanimoto’, ‘shape tanimoto and color tanimoto’]
Verbose FastROCS (verbose_fastrocs): If ‘On’ timing information for the FastROCS feature calculation will be written to the log.
Type: boolean
Default: False
Choices: [True, False]
Number of Graphsim Tanimoto Feature Query Poses (number_of_graphsim_tanimoto_feature_query_poses): Number of top scoring poses from the training set to use to create graphsim tanimoto features. These features will be the tanimoto of the molecule being docked to each of the top scoring poses from the training set. The type of fingerprint uses to create the tanimoto is determined by the ‘Graphsim Tanimoto Feature Fingerprint Type’ parameter.
Required
Type: integer
Default: 0
Graphsim Tanimoto Feature Fingerprint Type (graphsim_tanimoto_feature_fingerprint_type): Type type of tanimoto to calculate for the Graphsim Tanimoto Features. This parameter is ignored if ‘Number of Graphsim Tanimoto Feature Query Poses’ is set to 0.
Type: string
Default: Tree
Choices: [‘Circular’, ‘Path’, ‘Tree’]
Options: Model Training
Number of Training Stages (number_of_training_stages): Number of training stages to run. The first stage creates a training set by docking a random subset of molecules and then creates a model. The second stage runs the model on the molecules from the first stage that were not selected for training, selects the top molecules by the predicted score, docks those selected molecules, adds those new docked molecules to the training data from the previous stage and creates a new updated model. Subsequent training stages work analogously to the second stage except.
Required
Type: integer
Default: 1
Fraction Train (fraction_train): The fraction of the input molecules that will be docked to create the training data for the score regression model. Increasing this value will increase the cost of the floe and decrease the minimum number of input molecules the floe requires to run (at the default value of 0.01 the minimum is 934600). Legal values for this parameter are between 0.1 and 0.001.
Required
Type: decimal
Default: 0.01
Final Dock Fraction (final_dock_fraction): The number of top scoring molecules from FastROCS that are passed to the final docking step is equal to this fraction of the size of the input collection(s). Increasing this value will increase the cost of the floe. When docking fewer than ~100M molecules it is recommended that this value be test to 0.08. The legal values for this parameter are between 0.01 and 0.1.
Required
Type: decimal
Default: 0.04
Batch Size (batch_size): Size of the training data, in number of molecules, that will be loaded onto the GPU at one time. If this value is set to zero the batch size will be dynamically assigned based on the amount of GPU memory.
Type: integer
Default: 0
Target Fraction GPU Memory (target_fraction_gpu_memory): The target fraction of GPU memory to allocation to a single batch of training data. Should be less than the available memory as the optimizer also needs to fit into memory. This parameter is only used when ‘Batch Size’ is 0 (i.e., dynamic setting of batch_size).
Type: decimal
Default: 0.75
Verbose Training (verbose_training): If true the training cube will write details of the training to the floe log.
Type: boolean
Default: True
Choices: [True, False]
Cycle Time for the training cube to avoid 12h limit (cycle_time_for_the_training_cube_to_avoid_12h_limit): This cube maintains an estimate of the total time the cube will consume after the next epoch calculation. If that estimated time exceeds this value in hours the cube will save the state of the current calculation and emit it to the ‘partial’ output port. The partial output can then be cycled back into the ‘intake’ port of this cube to continue the calculation. The purpose of this mechanism is to avoid the 12h limit. For parallel cubes this value should be set to a little less than 12h. For serial cube is can be set arbitrarily high as they are not subject to the 12h limit.
Required
Type: decimal
Default: 3650.0
Collapse Small Training Batches (collapse_small_training_batches): If True when the entire set of training data can fit in the GPU memory then all steps will be run in a single batch, rather than in separate batches.
Type: boolean
Default: True
Choices: [True, False]
Options : Linear Model Configuration
Enable Linear Model (enable_linear_model): If One everything sent to the intake port will be sent to the true port, otherwise everything will be send to the false port.
Required
Type: boolean
Default: True
Choices: [True, False]
Linear Epochs (linear_epochs): Minimum number of epochs the linear optimization will run for. This number of epochs may be exceeded in order to satisfy the number of batches required.
Type: integer
Default: 2
Linear Batches (linear_batches): Minimum number of batches the linear optimization will run for. This number of batches may be exceeded in order to satisfy the number of epochs required.
Type: integer
Default: 25
Linear Steps (linear_steps): Number of steps the linear optimization will run for. The floe will first figure out how many batches need to be run and then divide the total number of steps among them. How the steps are divided among the batches depends on the setting of the ‘Linear Step Scaling Power; parameters
Type: integer
Default: 20000
Linear Step Scaling Power (linear_step_scaling_power): This parameter controls how the number of steps specified by ‘Linear Steps’ are divided among the optimization batches. If set to 0 the steps will be evenly divided among the batches. If set to a higher more steps will be run in the earlier batches. The higher the value the more the steps will be distributed towards the earlier batches.
Type: decimal
Default: 2
Linear Optimizer (linear_optimizer): Method used to optimize the linear model
Required
Type: string
Default: Adam
Choices: [‘Adam’, ‘SGD’, ‘LBFGS’]
Linear Loss Function (linear_loss_function): Loss function to use in the optimization
Required
Type: string
Default: MSELoss
Choices: [‘MSELoss’]
Linear Neural Net Nodes (linear_neural_net_nodes): The number of nodes in the intermediate linear layer. This parameter can be specified multiple times to increase the number of layers. If this parameter is not specified linear regression will be configured (there will be no activation function in this circumstance).
Type: integer
Linear Neural Net Activation Function (linear_neural_net_activation_function): Type of activation function to use between all linear layers. This parameter is ignored if no values are supplied to the Linear Layer Nodes parameter.
Type: string
Default: ReLU
Choices: [‘ReLU’, ‘ReLU6’, ‘Sigmoid’, ‘ELU’, ‘LeakyReLU (neg slope 0.01)’, ‘LeakyReLU (neg slope 0.1)’]
Linear Adam Optimizer Loss Fxn (linear_adam_optimizer_loss_fxn): Learning rate for the Adam optimizer. This parameter is ignored in the optimizer is not set to Adam
Required
Type: decimal
Default: 0.001
Options : Neural Net Configuration
Enable Neural Net Model (enable_neural_net_model): If ‘On’ a neural net score regression model(s) will be created. If ‘Off’ no neural net model will be created and the setting of the other parameters in this parameter group will be effectively ignored during the floe run.
Required
Type: boolean
Default: True
Choices: [True, False]
Neural Net Nodes (neural_net_nodes): Number of nodes in each layer of the neural net. Enter multiple values to create a multi layered neural net.
Type: integer
Default: [100, 100]
Neural Net Activation Function (neural_net_activation_function): Activation function used between each layer of the neural net.
Type: string
Default: ReLU
Choices: [‘ReLU’, ‘ReLU6’, ‘Sigmoid’, ‘ELU’, ‘LeakyReLU (neg slope 0.01)’, ‘LeakyReLU (neg slope 0.1)’]
Number of Neural Net Models (number_of_neural_net_models): The number of neural net models to create. Each model will be optimized from a random start. Training cost increased linearly with the number of models.
Type: integer
Default: 1
Neural Net Epochs (neural_net_epochs): Minimum number of epochs the neural net optimization will run for. This number of epochs may be exceeded in order to satisfy the number of batches required (see ‘Neural Net Batches’ parameter).
Type: integer
Default: 2
Neural Net Batches (neural_net_batches): Minimum number of batches the neural net optimization will run for. This number of epochs may be exceeded in order to satisfy the number of epochs required (see ‘Neural Net Epochs’ parameter).
Type: integer
Default: 25
Neural Net Steps (neural_net_steps): Number of steps the neural net optimization will run for. The floe will first figure out how many batches need to be run and then divide the total number of steps among them. How the steps are divided among the batches depends on the setting of the ‘Neural Net Step Scaling Power; parameters
Type: integer
Default: 20000
Neural Net Step Scaling Power (neural_net_step_scaling_power): This parameter controls how the number of steps specified by ‘Neural Net Steps’ are divided among the optimization batches. If set to 0 the steps will be evenly divided among the batches. If set to a higher more steps will be run in the earlier batches. The higher the value the more the steps will be distributed towards the earlier batches.
Type: decimal
Default: 2
Neural Net Optimizer (neural_net_optimizer): Method used to optimize the linear model
Required
Type: string
Default: Adam
Choices: [‘Adam’, ‘SGD’, ‘LBFGS’]
Neural Net Loss Function (neural_net_loss_function): Loss function to use in the optimization
Required
Type: string
Default: MSELoss
Choices: [‘MSELoss’]
Neural Net Adam Optimizer Loss Fxn (neural_net_adam_optimizer_loss_fxn): Learning rate for the Adam optimizer. This parameter is ignored in the optimizer is not set to Adam
Required
Type: decimal
Default: 0.001
Options: Hardware
Training Instance Disk Space (training_instance_disk_space): Required Disk Space on the machine(s) that will do model training. If this value is set to low for the number of input molecules the floe will fail quick with an error indicating the required number of setting. Higher values may result in longer run times because there will be fewer AWS GPU instance with the required amount of disk space and these may be in short supply on AWS. The total required disk space can be reduced by reducing the fraction of molecules that will be used as training data (see the ‘Options: Model Training -> Fraction Train’ parameter).
Type: decimal
Default: 3355443.2
Training Instance Types (training_instance_types): Instance type for the training model. Note this this must be running a local SSD drive.
Type: string
Default: !cdns
GPU Count (gpu_count): Minimum required number of GPU on the AWS instance used to train the regression model(s)
Type: decimal
Default: 4
Choices: [1, 2, 4, 8]
Training Cube Shared Memory (MiB) (training_cube_shared_memory_mib): Amount of shared memory (MiB) for cube doing the model training.
Type: decimal
Default: 2048
Training Cube RAM (MiB) (training_cube_ram_mib): Amount of memory (MiB) for the cube doing the model training.
Type: decimal
Default: 28672
Training Cube Spot Policy (training_cube_spot_policy): Spot policy for the model training cube
Type: string
Default: Prohibited
Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]
Training Cube Except on Failure (training_cube_except_on_failure): If true the training cube will throw and exception if it fails to train or has no training data
Type: boolean
Default: True
Choices: [True, False]
FastROCS Instance Types (fastrocs_instance_types): Instance type used by FastROCS. If unspecified an instance type will be chosen automatically
Type: string
Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.
FastROCS Spot Policy (fastrocs_spot_policy): Control whether spot or non-spot instances will be used for FastROCS cubes. In general spot instances are cheaper than non-spot instances and using them will reduce the cost of the floe, however spot instances can be in short supply and thus using them may increase the run time of the floe. The settings of this parameter have the following meaning. Allowed: Use both spot and non spot instances. Required: Only spot instances will be used. Preferred: Floe will preferentially use spot instances, but non-spot will be used if spot instances are in short supply. NotPreferred: Floe will preferentially use non-spot instances, but spot instances will be used if non-spot instances are in short supply. Prohibited: Only non-spot instances will be used.
Type: string
Default: Preferred
Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]
FastROCS Instance CPU Count (fastrocs_instance_cpu_count): Minimum CPU count for the FastROCS instances.
Type: integer
Default: 4
Input Fields
Input Conformers Field (input_conformers_field): Field on the input collection that holds the conformers of the molecules to be docked. If unspecified the default primary molecule field will be used.
Type: field_parameter::mol
Output Fields
Docked Score Field (docked_score_field): Field on the output hit list and raw results collection that will contain the docked score
Required
Type: field_parameter::float
Default: Chemgauss4
Docked Pose Field (docked_pose_field): Field on the output hit list and raw results collection that will hold the docked pose. If unspecified the default primary mol field will be used.
Type: field_parameter::mol
Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Design Unit Field (design_unit_field): Field on the ‘Output Design Unit(s) Dataset’ that will contain a copy of the design unit(s).
Type: field_parameter
Default: Design Unit
Design Unit ID Field (design_unit_id_field): Field on the ‘Output Design Unit(s) Dataset’ with a unique (for this run) identifier of the design unit
Required
Type: field_parameter::int
Default: Design Unit ID
Design Unit Link Field (design_unit_link_field): Field on the ‘Output Design Unit(s) Dataset’ containing a link to the design unit
Required
Type: field_parameter::link
Default: Design Unit Link
Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Bemis Murcko SMILES
Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Bemis Murcko ID
Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Bemis Murcko Rank
Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Hetero Bemis Murcko
Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Hetero Bemis Murcko ID
Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Hetero Bemis Murcko Rank
Training AUC Field (training_auc_field): Field on the model record holding the AUC of the model on the training data
Required
Type: field_parameter::float
Default: Training AUC
Training AUC Raw Data Field (training_auc_raw_data_field): Field on the model record holding the raw AUC data of the model on the training data
Type: field_parameter
Default: Training AUC Raw Data
Training AUC Num Actives Field (training_auc_num_actives_field): Field on the model record holding the the number of actives in the training data AUC calculation
Type: field_parameter::int
Default: Training AUC Num Actives
Training AUC Num DecoysField (training_auc_num_decoysfield): Field on the model record holding the number of decoys in the training data AUC calculation
Type: field_parameter::int
Default: Training AUC Num Decoys
Development
Test Train Failure Recovery (test_train_failure_recovery): If True and exception will be thrown by the training cube after it finished each batch optimization. The cube machinery should recover from this by emitting the state after the batch optimization to a cycle. The purpose of this flag it to test this recovery machinery and it should never be set/adjusted by the end user.
Type: boolean
Default: False
Choices: [True, False]
V2 Temporary Collection (v2_temporary_collection):
Type: boolean
Default: False
Choices: [True, False]
Dock Mode (dock_mode): If Off this floe will run in analysis mode. It will expect the input collection to be a Raw Results collection from a Gigadock Run with the docked pose in the ‘Docked Pose’ field and the original conformers in the primary molecule field. No design unit should be given in analysis mode. In analysis mode the output will be a dataset with an AUC calculation.
Type: boolean
Default: True
Choices: [True, False]
AUC Dataset (auc_dataset): Output dataset to which to write.
Required
Type: dataset_out
Default: AUC
AUC Raw Data Field (auc_raw_data_field): Field on the output success records containing the raw AUC data as an float vector. The length of the vector is the number of actives plus one and the value of the vector is the number of inactive that score better than the given active. This field is not required. If unspecified the field will not be created on the output record, but the cube will still output the other expected fields
Type: field_parameter
Default: AUC Raw Data
Max Parallel for Docking (max_parallel_for_docking): The maximum number of concurrently running copies of this Cube
Type: integer
Default: 25000
Max Parallel for Non-Docking Cubes (max_parallel_for_non_docking_cubes): The maximum number of concurrently running copies of this Cube
Type: integer
Default: 2000
Disk space (MiB) for shard handling cubes (disk_space_mib_for_shard_handling_cubes): Disk space for shard handling cubes except the training cube (promoted independently).
Type: decimal
Default: 19456
Predicted Score Field (predicted_score_field): Field on the input records to be packed
Required
Type: field_parameter::float
Default: Predicted Score
Predict Score Batch Size (predict_score_batch_size): Number of molecules to accumulate before making a prediction on that set of molecules.
Required
Type: integer
Default: 1000
Fingerprint Field (fingerprint_field): Field on the records holds a graphsim fingerprint that will be part of the feature vector. This is stored as an OEFingerprint rather and added to the float feature vector when needed by the floe to save memory.
Type: field_parameter
Default: Graphsim Fingerprint
Feature Vector Field (feature_vector_field): Field on the records that will hold the feature vector.
Type: field_parameter
Default: Feature Vector
Target Shard Size (target_shard_size): The target number of records in a shard.
0 indicates to run up to the max_shard_bytes limit per shard
Required
Type: integer
Default: 100000
Dock shard Size (dock_shard_size): Total count on the input shards to accumulate before emitting a group of shards
Required
Type: integer
Default: 1000
Linear SGD Learning Rate (linear_sgd_learning_rate): Learning rate for the SGD optimizer. This parameter is ignored in the optimizer is not set to SGD
Required
Type: decimal
Default: 0.001
Linear SGD Momentum (linear_sgd_momentum): Momentum for the SGD optimizer. This parameter is ignored in the optimizer is not set to SGD
Required
Type: decimal
Default: 0.0
Linear SGD Dampening (linear_sgd_dampening): Dampening for the SGD optimizer. This parameter is ignored in the optimizer is not set to SGD
Required
Type: decimal
Default: 0.0
Linear LBFGS Learning Rate (linear_lbfgs_learning_rate): Learning rate for the LBFGS optimizer. This parameter is ignored in the optimizer is not set to LBFGS
Required
Type: decimal
Default: 1.0
Linear LBFGS History Size (linear_lbfgs_history_size): History size for the LBFGS optimizer. This parameter is ignored if the optimizer is not set to LBFGS
Required
Type: integer
Default: 100
Min timeout for shard uploading (min_timeout_for_shard_uploading): Sets the minimum retry timeout time in seconds.
Required
Type: integer
Default: 2
Max timeout for shard uploading (max_timeout_for_shard_uploading): Sets the maximum retry timeout time in seconds.
Required
Type: integer
Default: 1800.0
Sunk time scalar for shard uploading (sunk_time_scalar_for_shard_uploading): The retry timeout of an OrionSession operation will be set to the amount of time this item of work has been processing times this scalar value.
Required
Type: decimal
Default: 0.5
Max Shard Upload Attempts (max_shard_upload_attempts): Number of attempts to make when uploading a shard
Type: integer
Default: 2
Retry dictionary for shard uploads (retry_dictionary_for_shard_uploads): Entry must be of the form ‘<status code>:<number of retries>’. Both <status code> and <number of retries> must be integer values.
Type: string
Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]
Min timeout for shard downloading (min_timeout_for_shard_downloading): Sets the minimum retry timeout time in seconds.
Required
Type: integer
Default: 2.0
Max timeout for shard downloading (max_timeout_for_shard_downloading): Sets the maximum retry timeout time in seconds.
Required
Type: integer
Default: 1800.0
Sunk time scalar for shard downloading (sunk_time_scalar_for_shard_downloading): The retry timeout of an OrionSession operation will be set to the amount of time this item of work has been processing times this scalar value.
Required
Type: decimal
Default: 0.5
Max Shard Download Attempts (max_shard_download_attempts): Number of attempts to make when downloading a shard
Type: integer
Default: 1
Retry dictionary for shard downloading (retry_dictionary_for_shard_downloading): Entry must be of the form ‘<status code>:<number of retries>’. Both <status code> and <number of retries> must be integer values.
Type: string
Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]
FastROCS Prep Mol Field (fastrocs_prep_mol_field):
Type: field_parameter::mol
Default: __FastROCSPreparedMolecule__
Enable cube timing report (time_all_cubes): If true this cube will emit timing information to the timing_data port.
Type: boolean
Default: True
Choices: [True, False]
Catch exceptions (catch_exceptions): If Off exception handling will be disabled for this cube.
Type: boolean
Default: True
Choices: [True, False]
Catch exceptions (parallel_catch_exception_methods): Specifies which methods of a parallel cube an exception will be caught and emitted to the exception port if the port is connected. If the exception port is connected to an exception handler this will stop the floe
Type: string
Default: [‘begin’]
Choices: [‘begin’, ‘process’, ‘end’]
Allow All Models (allow_all_models): Development parameter that if ‘On’ disables filtering the models by the size of the input
Type: boolean
Default: True
Choices: [True, False]
Verbose Model Selection (verbose_model_selection): If ‘On’ model fitlering/selection will be written to the log
Type: boolean
Default: True
Choices: [True, False]