4.0.2 September 2023¶
Filter Collection can now take multiple input collections of the same type (for example, FastROCS or Gigadock).
Fixed issue with Gigadock Warp hanging when processing extremely large collections.
Fixed failure in FastROCS Plus if ROCS rescoring is turned off and dock rescoring is not turned on.
Fixed issue with Prepare Giga Collections intermittently failing in the beginning of the floe if given large input file.
Added missing Gigadock Warp documentation.
4.0.0 July 2023 (Orion Floe 2023.1 Release)¶
The Gigadock Warp floe has been upgraded to use an AI model and will now cost less than the previous version and produce hit lists more similar to the hit list the Gigadock floe would produce given the same input. See Gigadock Warp Details for more details.
The parameter Options: Re-scoring -> Number of Molecules to Re-score can now be set up to 100M (the previous limit was 10M)
Gigadock Warp now requires 1M or more input molecules to dock in order to ensure it has enough training data to build an AI model of the score.
Options: Advanced -> Random Dock Fraction
Options: Advanced -> Final Dock Fraction
Options: Advanced -> Number of FastROCS Queries
Options: Advanced -> Cluster FastROCS Queries
Fixed an issue in FastROCS Plus where the consensus hit lists incorrectly had and extra field named “Pareto Dominance Rank”, in addition to a expected field “Pareto Rank”.
3.4.5 December 2022 (Orion 2022.4 Release)¶
FastROCS Plus now supports any type of Shape Query in the ROCS Re-scoring step. The core FastROCS screen is still restricted to simpler Shape Queries.
FastROCS Plus now supports searching by shape only. I.e, Shape Tanimoto, Ref Shape Tversky and Fit Shape Tversky are now additionally supported as similarity types.
3.3.4 September 2022¶
Collection based cubes now validate that downloaded shard data is the correct size, and retry the download if it is not. This protects against an extremely remote possibility that shard reads could lose data or hang the floe.
Cluster poses now properly clusters poses when ‘Options -> Single Conformer/Pose Input’ is switched from its default value of ‘On’ to ‘Off’.
The parameter ‘Options: Advanced -> Final Dock Fraction’ in the Gigadock Warp can no longer be set higher than 0.1. (There was an accidental regression in 3.3.0.) The default value of 0.08 is unchanged.
3.3.0 June 2022 (Orion 2022.2 release)¶
New Floe : Multi Query 2D Similarity.
FastROCS Plus now automatically outputs two additional datasets by default, ‘FastROCS Novelty Hit List’ and ‘ROCS Novelty Hit List’. These hit lists contain molecules from the FastROCS and the ROCS Re-scoring computations respectively that tend to have high 3D and low 2D similarity.
The Filter Collection and Prepare Giga Collections floes can now optionally take a set of known molecules and then filter the collection molecules based on their 2D similarity to the known molecules (see the ‘Options: Known Molecules’ parameter group in both these floes).
The Prepare Giga Collections now optionally accepts a dataset as input.
FastROCS Plus output hit lists now includes a field with the 2D Tanimoto similarity to the query.
Prepare Giga Collections no longer losses ~0.07% of molecules from the Gigadock collection when preparing large collections (the FastROCS Collection was unaffected by this bug).
Prepare Giga Collections no long creates an empty Gigadock collection if fewer than 1000 molecules are prepared.
Gigadock Warp output molecules now have explicit rather than implicit hydrogens.
Prepare Giga Collections will now fail if run with no input.
Fixed an out or memory issues in the FastROCS Plus floe that could occur in rare circumstances.
3.2.0 April 2022 (Orion 2022.1 release)¶
The Prepare Giga Collections floe now by default adds a field “Enantiomer Title” to the output collections that contains the title of the molecule with a postfix index to identify enantiomers of input molecule with unspecified stereo that were stereochemically enumerated.
The FastROCS Plus floe now has an option to output a collection of up to 10M of the top scoring FastROCS molecules.
Fixed a bug where the FastROCS Plus floe could fail due to an out of memory error if Options: Advanced: Number of Molecules to Re-score is set to a value much higher than the default.
Fixed an issue where Prepare Giga Collections could hang and the end of the floe waiting for the docking collection to close. This issue only affected preparation of large collections (e.g., Billions) and when using CPU rather than GPU Omega.
Fixed an issue in FastROCS Plus that caused more than one docked pose per molecule to be used when ‘Options: Advanced -> Query Conformer Generation Mode’ was set to ‘dock’.
Fixed an issue in the documentation of floe in the large scale floes package where the python names of parameters were not correct. (The python name of the parameter is the name used when launching a job with ocli).
3.1.7 Dec 2021 (Orion 2021.2.1 release)¶
This release contains a new floe Gigadock Warp that approximates a full Gigadock run using a combination of Docking and FastROCS. Gigadock Warp is ~8-10x less costly to run than a full Gigadock job and when docking billions of molecule recovers 70% of the same molecules the full Gigadock job does in the top 10K hit list.
For a tutorial on running Gigadock Warp see Dock Ten Million Molecules with Gigadock Warp and Analysis with Freeform Consensus
GPU instances on Amazon Web Services (AWS) have been in high demand recently. This can result in long run times for floes that uses GPUs, i.e., FastROCS Plus, Batch FastROCS, Prepare Giga Collections and Gigadock Warp. To help reduce the chance of encountering this issue these floes have been modified to by default use older AWS GPU instances that are generally more available but slight less cost efficient (~25% more GPU cost). These floes now expose parameters to allow specifying more the more efficient, but sometimes less available instances when a floe is run. See the Changes section of these release notes for details on individual floes.
For a tutorial on running Gigadock Warp see Dock Ten Million Molecules with Gigadock Warp and Analysis with Freeform Consensus
New Floe : Pareto Frontier Consensus finds the best records in an input dataset based on two or more numeric values in the dataset using a Pareto Frontier analysis.
The Batch FastROCS floe now by default pre-appends each output datasets with the name of input query it is associated with.
The Batch FastROCS floe now has an additional output dataset for each query containing the cluster heads of the output hit list. Full clustering information is still present in the primary hit lists.
The FastROCS Plus no longer outputs the Similarity Combo, Color Similarity and Shape Similarity fields. These fields were duplicates of the output fields Tanimoto Combo, Color Tanimoto, and Shape Tanimoto (or fields with Tversky instead of Tanimoto if Tversky scoring is used) which were and continue to be outputted.
The GigaDock floe has been renamed the Gigadock floe.
The Giga Docking Collection to Hi-res FastROCS Collection has been renamed the Gigadock Collection to Hi-res FastROCS Collection floe.
The FastROCS Plus (Keywords: Shape, Docking, Consensus, Collection, Virtual Screening) floe has been renamed the FastROCS Plus floe.
The Batch FastROCS (Keywords: Shape, Collection, Virtual Screening) floe has been renamed the Batch FastROCS floe.
The FastROCS Plus floe now uses AWS g3 GPU instance by default (previous it used g4dn.2xlarge). Parameters for choosing the AWS GPU instance to use are now exposed (see the GPU Hardware parameter group).
The Batch FastROCS floe now uses AWS g3 GPU instance by default (previous it used g4dn.xlarge). Parameters for choosing the AWS GPU instance to use are now exposed (see the GPU Hardware parameter group).
The Prepare Giga Collections now uses CPU Omega by default (previously it used GPU Omega). This can be adjusted with the Conformer Generation Settings -> Use GPU Omega setting.
When processing a tab or comma separated file Prepare Giga Collections will now, by default, name the field on the output giga docking collection ‘Molecule’ rather than using the column title from the csv/tsv file (see the new Format Specific Settings -> Molecule Field Name For TSV/CSV parameter). This change was made because in practice the column is often named ‘SMILES’ and this field name propagates to the docked pose field of the output hit lists of the Gigadock floe, where ‘SMILES’ is not an appropriate name.
Fixed a bug in the Batch FastROCS floe that causes it not to write out the fields with the standard names for ROCS scores, i.e., Tanimoto Combo, Color Tanimoto, and Shape Tanimto or Tversky Combo, Color Tversky, and Shape Tvserky when Tversky similarity is selected. The floe was previously writing out fields with Similarity in place of Tanimoto or Tversky in the field names.
In the FastROCS Plus floe, fixed the description of the parameter Options: Advances -> Query Conformer Generation Modes which incorrectly referred to ‘freeform’ as an available conformer generation mode.
Fixed a bug in the FastROCS Plus floe that caused it to ignore all design units supplied to the Inputs -> Design Unit(S) (Optional) parameter if Options: Advanced -> ROCS Re-Scoring Mode was set to ‘Off’.
Fixed a bug in Filter Collection that causes it to lose compression on the molecules of FastROCS collections. In practice this resulted in a functional but slightly larger than necessary output collection when processing FastROCS collections (these collections were not functional in the older deprecated Multi-Query Ligand-Based Virtual Screening with FastROCS and SubROCS floe).
Suppressed rare XlogP calculation warning for individual atoms when running Prepare Giga Collections and Filter Collection. While rare, when processing billions of molecules they tended to flood the log with thousands of un-actionable warnings.
3.0.3 Oct 2021 (hotfix)¶
Fixed a bug that could cause docking to fail when running in multi-receptor Hybrid mode.
Removed a development parameter that was accidentally exposed in the 3.0.2 release.
3.0.2 Jun 2021 (Orion 2021.1 release)¶
The functionality of the “Multi-Query Ligand-Based Virtual Screening with FastROCS and SubROCS” from the “OpenEye Ligand-Based Virtual Screening” package has been absorbed into the Batch FastROCS in this package.
All floes now have much improved parameter organization in the floe launch UI.
FastROCS Plus FastROCS search with optional re-scoring and consensus scoring of best fastROCS molecules with Docking and ROCS.
Batch FastROCS FastROCS search with ROCS re-score of best FastROCS molecules. A separate hit-list is produce for each query.
FreeForm Pose Calculates the Freeform Delta G of docked poses with optional Delta G filtering of poses.
Cluster Poses Clusters docked poses, or alternatively ROCS overlays, based on the 3D similarity.
Sample Collection Converts a random sample of a FastROCS or GigaDocking collection into a Dataset.
Gigadock Collection to Hi-res FastROCS Collection Converts a Giga Docking Collection into a fastROCS collection with a max of 200 conformers per molecule, as opposed to a max of 10 conformers per molecule in a standard fastROCS collection.
Gigadock now supports multiple receptors/design units.
Prepare Giga Collections now supports GPU Omega (enabled by default) which can reduce floe cost by up to 40%.
Prepare Giga Collections now retains the original SMILES string when processing SMILES files or command/tab separated files containing SMILES.
Fixed issue where floes could not read collections created by very old versions of Orion.
2.0.6 April 2021 (hotfix)¶
2.0.0 Nov 2020 (Orion 2020.3 release)¶
Parameters for setting the output fields for the docked molecule and score have been temporarily removed along with the parameter specifying the input molecule field. These parameters are very rarely needed and have been set by users not fully understanding their impact. These parameters are planned to return in a future release once the FLOE UI has been improved to provide better clarity on when to set them.
No new feature in this release.
Added duplicate checking to ‘Prepare Giga Collections’. Only applied to sequential molecules.
Fixed issue with molecules from a dataset (not collection) being docking by ‘Giga Dock’ floe being lost rather than being sent to the Restart Collection if the cost threshold was exceeded.
Fixes issue with ‘Prepare Giga Collections’ not deleting its temporary collection.
Better ordering of parameters (better categorization is coming in a future release).
Improved the efficiency of the Giga-Dock floe by about 15%
Protected the giga-docking floe against the unlikely case of 12h timeout docking an individual shard (a full docking collection consists of ~1M shards) of molecule to a gigantic active site (>~2000 cubic Angstroms).
0.1.0 August 2020 (Orion 2020.2 release)¶
Initial release of Large Scale Floes package.
This package include the functionality of the OpenEye Giga Docking Floes and OpenEye Scientific Floes packages, both of which have been deprecated. (Exception: The torsion scanning floes from OpenEye Scientific Floes were moved into the OpenEye QM Floes package).
The Giga Dock (FRED) and Giga Dock (HYBRID) floes from OpenEye Giga Docking Floes have been merged into a single floe Giga Docking which has a parameter to select the docking mode (i.e., FRED or HYBRID).
The Step 1,2 & 3 collection preparation floes from OpenEye Scientific Floes have been merged into a single floe Prepare Giga Collections floe.
The collection preparation floe filters are now significantly more configurable. Some of the new optional filtering options are:
Filtering by SMARTS pattern (either exclude or require)
Filtering with a custom OEFilter file
Random filtering. This allows the creating of collection that contain a random subset of the input molecule for testing purposes.
The collection preparation floe now accepts a wider variety of input formats, including tar and zip archive formats.
A Filter Collection floe has been added that allows for filtering of existing collection.
The Giga Docking floe has a new docking mode Fast-FRED. Fast-FRED uses a simpler scoring function and samples slightly less in the initial stages of docking while using the same Chemgauss4 scoring function for final optimization and scoring. This docking mode can be x2-x4 times faster (and hence less costly) than the standard FRED. Results appear similar to standard FRED, but this docking mode has not been thoroughly validate as of this release in terms of quality of results.
An issue with the collection preparation floes duplicating some input molecules has been addressed.