Filter Collection
Description
Creates a new collection from an existing Giga Docking or FastROCS collection that is filtered by molecular property (or randomly).
Filtering can be done at random, with SMARTS patterns, with OEFilter, or by many basic molecular properties. All filters of this floe are turned off by default, and thus the input collection will be a copy of the output collection unless one or more of the filters is enabled.
See also
This floe is used in the Dock One Million Molecules with Gigadock Floe tutorial.
Details
Title : Filter CollectionTags : Large Scale Floes Collections Preparation Filtering SMARTS molprop oefilterPython Name : #05_filter_collection
Parameters
Inputs
Input Collection An input collection to filter.Type : collection_sourceRequired : TruePython Name : input_collection
Outputs
Filtered Collection Name Name of the collection to createType : collection_sinkRequired : TrueDefault : Filtered CollectionPython Name : output_collection_name Temporary Filtered Collection Name This collection will be created by this floe for internal use during the run and will be automatically deleted when the run finishes.Type : collection_sinkRequired : FalseDefault : Temporary Filtered Collection (Filter Collection)Python Name : temporary_filtered_collection_name
Options
Dry Run If ‘On’, molecules will be read and passed through the filtering cubes normally but will not be passed to the collection creation cubes. This makes the Floe inexpensive and fast to run, however no output collection will be generated. With this switch on you can see how many molecules will be read from your input file(s)/collection(s) and how many will pass the filters (examine the filtering Cube port counts) that have been configured, and make adjustments quickly. Once the best filtering settings are determined you can run the Floe with those settings and this option turned off.Type : booleanRequired : TrueDefault : FalseChoices :True, FalsePython Name : switch Keep this fraction This parameter is the fraction of the input molecules that will be retained using a random selection criterion. Set this value to less than one to create output collection with a random subset of the from the input collection.Type : decimalRequired : TrueDefault : 1.0Range : 0.0 to 1.0Python Name : random_retain_probability
Filtering : Basic Properties
Max molecular weight Molecules with molecular weight greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high molecular weight.Type : decimalRequired : FalseMin Value : 0.0Python Name : mw_max Min molecular weight Molecules with molecular weight less than this value will be filtered out. If unspecified this cube will not filter out molecules with low molecular weight.Type : decimalRequired : FalseMin Value : 0.0Python Name : mw_min Max rotatable bond count Molecules with rotatable bond count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high rotatable bond count.Type : integerRequired : FalseMin Value : 0Python Name : rot_bond_max Min rotatable bond count Molecules with rotatable bond count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low rotatable bond count.Type : integerRequired : FalseMin Value : 0Python Name : rot_bond_min Max count undefined atom stereo Molecules with count undefined atom stereo greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high count undefined atom stereo.Type : integerRequired : FalseMin Value : 0Python Name : atom_stereo_max Max count undefined bond stereo Molecules with count undefined bond stereo greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high count undefined bond stereo.Type : integerRequired : FalseMin Value : 0Python Name : bond_stereo_max Max acceptor count Molecules with acceptor count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high acceptor count.Type : integerRequired : FalseMin Value : 0Python Name : acc_max Min acceptor count Molecules with acceptor count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low acceptor count.Type : integerRequired : FalseMin Value : 0Python Name : acc_min Max donor count Molecules with donor count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high donor count.Type : integerRequired : FalseMin Value : 0Python Name : don_max Min donor count Molecules with donor count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low donor count.Type : integerRequired : FalseMin Value : 0Python Name : don_min Max topological polar surface area Molecules with topological polar surface area greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high topological polar surface area.Type : decimalRequired : FalsePython Name : tpsa_max Min topological polar surface area Molecules with topological polar surface area less than this value will be filtered out. If unspecified this cube will not filter out molecules with low topological polar surface area.Type : decimalRequired : FalsePython Name : tpsa_min Max xlogp Molecules with xlogp greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high xlogp.Type : decimalRequired : FalsePython Name : xlogp_max Min xlogp Molecules with xlogp less than this value will be filtered out. If unspecified this cube will not filter out molecules with low xlogp.Type : decimalRequired : FalsePython Name : xlogp_min Max formal charge Molecules with formal charge greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high formal charge.Type : integerRequired : FalsePython Name : charge_max Min formal charge Molecules with formal charge less than this value will be filtered out. If unspecified this cube will not filter out molecules with low formal charge.Type : integerRequired : FalsePython Name : charge_min Max aromatic ring count Molecules with aromatic ring count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high aromatic ring count.Type : integerRequired : FalseMin Value : 0Python Name : aro_max Min aromatic ring count Molecules with aromatic ring count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low aromatic ring count.Type : integerRequired : FalseMin Value : 0Python Name : aro_min
Filtering: SMARTS
Required SMARTS If one or more SMARTS patterns are supplied to this parameter then every molecule passed to this cube must match one of these smarts patterns of it will be filtered. This check is skipped if no SMARTS patterns are supplied to this cube.Type : stringRequired : FalseAccepts Multiple ValuesPython Name : required_smarts Excluded SMARTS Every molecule that matched any of the SMARTS patterns supplied to this parameter will be filtered.Type : stringRequired : FalseAccepts Multiple ValuesPython Name : excluded_smarts
Filtering: OEFilter
OEFilter TypeType : stringRequired : TrueDefault : NoneChoices :BlockBuster, Lead, Drug, PAINS, NonePython Name : oefilter_type Filter Rules Optional rules for OEFilter (see https://docs.eyesopen.com/toolkits/python/molproptk/filter_files.html). These rules will be added to the OEFilter rules. Select None for ‘OEFilter Type’ if you want to replace rather than add rules.Type : file_inRequired : FalsePython Name : filter_in
Filtering: 2D Similarity to Known Molecules
These parameters allow molecules being prepared to be filtered by their 2D Tanimoto to one or more known molecules (commonly the known molecules would be actives from a project). To use this filter, a dataset of known molecules must be passed to the ‘Known Molecules’ parameter and either or both the ‘Filter Out Tanimotos Higher Than’ and ‘Filter Out Tanimotos Lower Than’ parameters must be specified (if these are not specified all molecules will pass this filter). If there are multiple known molecules, the molecule being prepared’s highest Tanimoto to any of the known molecules will be used to filter it. The ‘Options->Dry Run’ parameter can be set to ‘On’ to cheaply test how changing the parameters in this group affect the number of molecules being filtered out.
Known Molecules If this parameter is specified each molecule being prepared will be assigned a single 2D Tanimoto value equal highest 2D Tanimoto to any molecule in this dataset(s). The prepared molecule will then be filtered by comparing this value to the setting of the ‘Filter Out Tanimotos Higher Than’ and/or ‘Filter Out Tanimotos Lower Than’ parameters. WARNING: A significant filtering compute cost can be incurred, even in ‘Dry Run’ mode, if a large number of molecule are passed to this parameter (For a 1 Billion molecule collection and 10K known molecules the filtering portion of the cost will typically be about ~$20).Type : data_sourceRequired : FalsePython Name : known_molecules Filter Out Tanimotos Higher Than If specified molecules with a 2D Tanimoto higher that this value will be filtered out. Use this parameter if you want to remove molecules that are similar in 2D space to any of the known molecules.Type : decimalRequired : FalsePython Name : filter_out_tanimotos_higher_than Filter Out Tanimotos Lower Than If specified molecules with a 2D Tanimoto lower than this value will be filtered out. Use this parameter if you want to remove molecules are different in 2D space to any of the known molecules.Type : decimalRequired : FalsePython Name : filter_out_tanimotos_lower_than Known Molecules 2D Fingerprint Method The 2D Fingerprint method used to compute the Tanimotos for the known molecules filter.Type : stringRequired : FalseDefault : CircularChoices :Circular, Path, TreePython Name : known_molecules_2d_fingerprint_method Use Virtual Screening 2D Fingerprint Variant If ‘On’ the virtual screening variant of the selected 2D fingerprint will be used for the knownmolecules filter. The virtual screening variant treats certain functional group identically regardless of there pKa state. E.g. protonated and unprotonated carboxylic acids.Type : booleanRequired : FalseDefault : TrueChoices :True, FalsePython Name : use_virtual_screening_2d_fingerprint_variant Known Molecule Tanimoto Field If this parameter is specified the 2D Tanimoto used for known molecule filtering for each processed molecule will be placed in the output collection in a field of this name. If unspecified the Tanimoto value will not be stored in the output collection.Type : field_parameter::floatRequired : FalsePython Name : known_molecule_tanimoto_field Maximum Number of Filtering Cubes Maximum number of cubes to use for filtering. Increasing this value can improved the runtime in cases where a large number of known molecules are supplied. This value can only be set above the default value of 250 if the number of molecules passed to ‘Filtering: 2D Similarity To Known Molecules -> Known Molecules’ time the setting of ‘Options->Keep This Fraction’ is greater than 10000.Type : integerRequired : FalseDefault : 500Range : 1 to 5000Python Name : maximum_number_of_filtering_cubes
Input Fields
These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.
Known Molecule Field Field on the known molecules dataset holding the known molecules. If unspecified the default Primary molecule on the record will be used.Type : field_parameter::molRequired : FalsePython Name : known_molecule_field