Pareto Frontier Consensus

Description

This floe creates a consensus list using a Pareto Frontier method (a.k.a. Pareto Dominance). The consensus assigns a Pareto Dominance Rank to each record and outputs only those records that have a rankless than a specified minimum value (see the ‘Pareto Dominance Max Rank’ parameter). The Pareto Dominance Rank is based on the values in specified fields on the input dataset(s) (see ‘Consensus Field(s) with High Values Preferred’ and ‘Consensus Field(s) with Low Values Preferred’ parameters) and is equal to the number of other records which have a better value in every field.

Details

Title : Pareto Frontier Consensus
Tags : Munging Consensus Analysis
Python Name : #12_pareto_frontier

Parameters

Inputs

The input to this floe is a dataset(s) and two or more numeric (float or interger) fields to be used for the consensus. The fields should be supplied to either the ‘Consensus Field(s) with High Values Preferred’ or ‘Consensus Field(s) with Low Values Preferred’ depending on whether the consensus should consider higher of lower values of the field to be better. The floe will fail if at least at total of two fields total are not specified. Records with missing fields will be discarded, although the floe will only fail if every record in the supplied datasets has a missing field.

  • Input Dataset The dataset(s) to read records from
    Type : data_source
    Required : True
    Python Name : input_dataset
  • Consensus Field(s) with High Values Preferred Integer and/or Float field(s) on the input dataset(s) to use in the consensus and for which high values are preferred (e.g., a value of 5 should be considered ‘better’ than a values of -1). Tanimoto similarities are an example of real world data in which high values (i.e., higher similarity) are preferred. Multiple fields can be specified for this parameter.
    Type : string
    Required : False
    Accepts Multiple Values
    Python Name : consensus_fields_with_high_values_preferred
  • Consensus Field(s) with Low Values Preferred Integer and/or Float field(s) on the input dataset(s) to use in the consensus and for which lower values are preferred (e.g., a value of -1 should be considered ‘better’ than a values of 5). Binding energies are an example of real world data in which lower values are typically preferred (i.e. the lower value the binding energy the better the binding). Multiple fields can be specified for this parameter.
    Type : string
    Required : False
    Accepts Multiple Values
    Python Name : consensus_fields_with_low_values_preferred

Outputs

  • Output Dataset Name of the consensus output dataset
    Type : dataset_out
    Required : True
    Default : Pareto Frontier Consensus
    Python Name : output_dataset

Options

  • Pareto Dominance Max Rank This is the maximum allowed Pareto Dominance Rank a record may have an still make it onto the consensus output. The Pareto Dominance Rank of a given record is the number of other record which have a better value for every one of the values used in the consensus (see ‘Consensus Field(s) with High Values Preferred’ and ‘Consensus Field(s) with Low Values Preferred’ inputparameters).
    Type : integer
    Required : False
    Default : 4
    Range : 0 to 10
    Python Name : pareto_dominance_max_rank

Output Fields

These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.

  • Pareto Dominance Rank Field Integer Field on the output dataset holding the Pareto Dominance Rank of the record. A rank of 0 is the lowest and the best rank. The highest rank that will appear in the output is determined by the setting of the ‘Pareto Dominance Max Rank’ parameter
    Type : field_parameter::int
    Required : True
    Default : Pareto Dominance Rank
    Python Name : pareto_dominance_rank_field