Pareto Frontier Consensus
Description
This floe creates a consensus list using a Pareto Frontier method (a.k.a. Pareto Dominance). The consensus assigns a Pareto Dominance Rank to each record and outputs only those records that have a rankless than a specified minimum value (see the ‘Pareto Dominance Max Rank’ parameter). The Pareto Dominance Rank is based on the values in specified fields on the input dataset(s) (see ‘Consensus Field(s) with High Values Preferred’ and ‘Consensus Field(s) with Low Values Preferred’ parameters) and is equal to the number of other records which have a better value in every field.
Details
Title : Pareto Frontier ConsensusTags : Munging Consensus AnalysisPython Name : #12_pareto_frontier
Parameters
Inputs
The input to this floe is a dataset(s) and two or more numeric (float or interger) fields to be used for the consensus. The fields should be supplied to either the ‘Consensus Field(s) with High Values Preferred’ or ‘Consensus Field(s) with Low Values Preferred’ depending on whether the consensus should consider higher of lower values of the field to be better. The floe will fail if at least at total of two fields total are not specified. Records with missing fields will be discarded, although the floe will only fail if every record in the supplied datasets has a missing field.
Input Dataset The dataset(s) to read records fromType : data_sourceRequired : TruePython Name : input_dataset Consensus Field(s) with High Values Preferred Integer and/or Float field(s) on the input dataset(s) to use in the consensus and for which high values are preferred (e.g., a value of 5 should be considered ‘better’ than a values of -1). Tanimoto similarities are an example of real world data in which high values (i.e., higher similarity) are preferred. Multiple fields can be specified for this parameter.Type : stringRequired : FalseAccepts Multiple ValuesPython Name : consensus_fields_with_high_values_preferred Consensus Field(s) with Low Values Preferred Integer and/or Float field(s) on the input dataset(s) to use in the consensus and for which lower values are preferred (e.g., a value of -1 should be considered ‘better’ than a values of 5). Binding energies are an example of real world data in which lower values are typically preferred (i.e. the lower value the binding energy the better the binding). Multiple fields can be specified for this parameter.Type : stringRequired : FalseAccepts Multiple ValuesPython Name : consensus_fields_with_low_values_preferred
Outputs
Output Dataset Name of the consensus output datasetType : dataset_outRequired : TrueDefault : Pareto Frontier ConsensusPython Name : output_dataset
Options
Pareto Dominance Max Rank This is the maximum allowed Pareto Dominance Rank a record may have an still make it onto the consensus output. The Pareto Dominance Rank of a given record is the number of other record which have a better value for every one of the values used in the consensus (see ‘Consensus Field(s) with High Values Preferred’ and ‘Consensus Field(s) with Low Values Preferred’ inputparameters).Type : integerRequired : FalseDefault : 4Range : 0 to 10Python Name : pareto_dominance_max_rank
Output Fields
These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.
Pareto Dominance Rank Field Integer Field on the output dataset holding the Pareto Dominance Rank of the record. A rank of 0 is the lowest and the best rank. The highest rank that will appear in the output is determined by the setting of the ‘Pareto Dominance Max Rank’ parameterType : field_parameter::intRequired : TrueDefault : Pareto Dominance RankPython Name : pareto_dominance_rank_field