Filter Collection

Description

Creates a new collection from an existing Giga Docking or FastROCS collection that is filtered by molecular property (or randomly).

Filtering can be done at random, with SMARTS patterns, with OEFilter, or by many basic molecular properties. All filters of this floe are turned off by default, and thus the input collection will be a copy of the output collection unless one or more of the filters is enabled.

See also

This floe is used in the Dock One Million Molecules with Gigadock Floe tutorial.

Details

Title : Filter Collection
Tags : Large Scale Floes Collections Preparation Filtering SMARTS molprop oefilter
Python Name : #05_filter_collection

Parameters

Inputs

  • Input Collection An input collection to filter.
    Type : collection_source
    Required : True
    Python Name : input_collection

Outputs

  • Filtered Collection Name Name of the collection to create
    Type : collection_sink
    Required : True
    Default : Filtered Collection
    Python Name : output_collection_name
  • Temporary Filtered Collection Name This collection will be created by this floe for internal use during the run and will be automatically deleted when the run finishes.
    Type : collection_sink
    Required : False
    Default : Temporary Filtered Collection (Filter Collection)
    Python Name : temporary_filtered_collection_name

Options

  • Dry Run If ‘On’, molecules will be read and passed through the filtering cubes normally but will not be passed to the collection creation cubes. This makes the Floe inexpensive and fast to run, however no output collection will be generated. With this switch on you can see how many molecules will be read from your input file(s)/collection(s) and how many will pass the filters (examine the filtering Cube port counts) that have been configured, and make adjustments quickly. Once the best filtering settings are determined you can run the Floe with those settings and this option turned off.
    Type : boolean
    Required : True
    Default : False
    Choices :True, False
    Python Name : switch
  • Keep this fraction This parameter is the fraction of the input molecules that will be retained using a random selection criterion. Set this value to less than one to create output collection with a random subset of the from the input collection.
    Type : decimal
    Required : True
    Default : 1.0
    Range : 0.0 to 1.0
    Python Name : random_retain_probability

Filtering : Basic Properties

  • Max molecular weight Molecules with molecular weight greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high molecular weight.
    Type : decimal
    Required : False
    Min Value : 0.0
    Python Name : mw_max
  • Min molecular weight Molecules with molecular weight less than this value will be filtered out. If unspecified this cube will not filter out molecules with low molecular weight.
    Type : decimal
    Required : False
    Min Value : 0.0
    Python Name : mw_min
  • Max rotatable bond count Molecules with rotatable bond count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high rotatable bond count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : rot_bond_max
  • Min rotatable bond count Molecules with rotatable bond count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low rotatable bond count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : rot_bond_min
  • Max count undefined atom stereo Molecules with count undefined atom stereo greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high count undefined atom stereo.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : atom_stereo_max
  • Max count undefined bond stereo Molecules with count undefined bond stereo greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high count undefined bond stereo.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : bond_stereo_max
  • Max acceptor count Molecules with acceptor count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high acceptor count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : acc_max
  • Min acceptor count Molecules with acceptor count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low acceptor count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : acc_min
  • Max donor count Molecules with donor count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high donor count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : don_max
  • Min donor count Molecules with donor count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low donor count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : don_min
  • Max topological polar surface area Molecules with topological polar surface area greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high topological polar surface area.
    Type : decimal
    Required : False
    Python Name : tpsa_max
  • Min topological polar surface area Molecules with topological polar surface area less than this value will be filtered out. If unspecified this cube will not filter out molecules with low topological polar surface area.
    Type : decimal
    Required : False
    Python Name : tpsa_min
  • Max xlogp Molecules with xlogp greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high xlogp.
    Type : decimal
    Required : False
    Python Name : xlogp_max
  • Min xlogp Molecules with xlogp less than this value will be filtered out. If unspecified this cube will not filter out molecules with low xlogp.
    Type : decimal
    Required : False
    Python Name : xlogp_min
  • Max formal charge Molecules with formal charge greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high formal charge.
    Type : integer
    Required : False
    Python Name : charge_max
  • Min formal charge Molecules with formal charge less than this value will be filtered out. If unspecified this cube will not filter out molecules with low formal charge.
    Type : integer
    Required : False
    Python Name : charge_min
  • Max aromatic ring count Molecules with aromatic ring count greater than this value will be filtered out. If unspecified this cube will not filter out molecules with high aromatic ring count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : aro_max
  • Min aromatic ring count Molecules with aromatic ring count less than this value will be filtered out. If unspecified this cube will not filter out molecules with low aromatic ring count.
    Type : integer
    Required : False
    Min Value : 0
    Python Name : aro_min

Filtering: SMARTS

  • Required SMARTS If one or more SMARTS patterns are supplied to this parameter then every molecule passed to this cube must match one of these smarts patterns of it will be filtered. This check is skipped if no SMARTS patterns are supplied to this cube.
    Type : string
    Required : False
    Accepts Multiple Values
    Python Name : required_smarts
  • Excluded SMARTS Every molecule that matched any of the SMARTS patterns supplied to this parameter will be filtered.
    Type : string
    Required : False
    Accepts Multiple Values
    Python Name : excluded_smarts

Filtering: OEFilter

  • OEFilter Type
    Type : string
    Required : True
    Default : None
    Choices :BlockBuster, Lead, Drug, PAINS, None
    Python Name : oefilter_type
  • Filter Rules Optional rules for OEFilter (see https://docs.eyesopen.com/toolkits/python/molproptk/filter_files.html). These rules will be added to the OEFilter rules. Select None for ‘OEFilter Type’ if you want to replace rather than add rules.
    Type : file_in
    Required : False
    Python Name : filter_in

Filtering: 2D Similarity to Known Molecules

These parameters allow molecules being prepared to be filtered by their 2D Tanimoto to one or more known molecules (commonly the known molecules would be actives from a project). To use this filter, a dataset of known molecules must be passed to the ‘Known Molecules’ parameter and either or both the ‘Filter Out Tanimotos Higher Than’ and ‘Filter Out Tanimotos Lower Than’ parameters must be specified (if these are not specified all molecules will pass this filter). If there are multiple known molecules, the molecule being prepared’s highest Tanimoto to any of the known molecules will be used to filter it. The ‘Options->Dry Run’ parameter can be set to ‘On’ to cheaply test how changing the parameters in this group affect the number of molecules being filtered out.

  • Known Molecules If this parameter is specified each molecule being prepared will be assigned a single 2D Tanimoto value equal highest 2D Tanimoto to any molecule in this dataset(s). The prepared molecule will then be filtered by comparing this value to the setting of the ‘Filter Out Tanimotos Higher Than’ and/or ‘Filter Out Tanimotos Lower Than’ parameters. WARNING: A significant filtering compute cost can be incurred, even in ‘Dry Run’ mode, if a large number of molecule are passed to this parameter (For a 1 Billion molecule collection and 10K known molecules the filtering portion of the cost will typically be about ~$20).
    Type : data_source
    Required : False
    Python Name : known_molecules
  • Filter Out Tanimotos Higher Than If specified molecules with a 2D Tanimoto higher that this value will be filtered out. Use this parameter if you want to remove molecules that are similar in 2D space to any of the known molecules.
    Type : decimal
    Required : False
    Python Name : filter_out_tanimotos_higher_than
  • Filter Out Tanimotos Lower Than If specified molecules with a 2D Tanimoto lower than this value will be filtered out. Use this parameter if you want to remove molecules are different in 2D space to any of the known molecules.
    Type : decimal
    Required : False
    Python Name : filter_out_tanimotos_lower_than
  • Known Molecules 2D Fingerprint Method The 2D Fingerprint method used to compute the Tanimotos for the known molecules filter.
    Type : string
    Required : False
    Default : Circular
    Choices :Circular, Path, Tree
    Python Name : known_molecules_2d_fingerprint_method
  • Use Virtual Screening 2D Fingerprint Variant If ‘On’ the virtual screening variant of the selected 2D fingerprint will be used for the knownmolecules filter. The virtual screening variant treats certain functional group identically regardless of there pKa state. E.g. protonated and unprotonated carboxylic acids.
    Type : boolean
    Required : False
    Default : True
    Choices :True, False
    Python Name : use_virtual_screening_2d_fingerprint_variant
  • Known Molecule Tanimoto Field If this parameter is specified the 2D Tanimoto used for known molecule filtering for each processed molecule will be placed in the output collection in a field of this name. If unspecified the Tanimoto value will not be stored in the output collection.
    Type : field_parameter::float
    Required : False
    Python Name : known_molecule_tanimoto_field
  • Maximum Number of Filtering Cubes Maximum number of cubes to use for filtering. Increasing this value can improved the runtime in cases where a large number of known molecules are supplied. This value can only be set above the default value of 250 if the number of molecules passed to ‘Filtering: 2D Similarity To Known Molecules -> Known Molecules’ time the setting of ‘Options->Keep This Fraction’ is greater than 10000.
    Type : integer
    Required : False
    Default : 500
    Range : 1 to 5000
    Python Name : maximum_number_of_filtering_cubes

Input Fields

These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.

  • Known Molecule Field Field on the known molecules dataset holding the known molecules. If unspecified the default Primary molecule on the record will be used.
    Type : field_parameter::mol
    Required : False
    Python Name : known_molecule_field