Pareto Frontier Consensus

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Gigadock

  • Product-based/FastROCS

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/Analysis

  • Solution-based/Virtual-screening/Analysis/Consensus

  • Task-based/Data Science/Clustering

Description

This floe creates a consensus list using a Pareto Frontier method (a.k.a. Pareto Dominance). The consensus assigns a Pareto Dominance Rank to each record and outputs only those records that have a rank less than a specified minimum value (see the ‘Pareto Dominance Max Rank’ parameter). The Pareto Dominance Rank is based on the values in specified fields on the input dataset(s) (see ‘Consensus Field(s) with High Values Preferred’ and ‘Consensus Field(s) with Low Values Preferred’ parameters) and is equal to the number of other records which have a better value in every field.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Dataset (input_dataset): The dataset(s) to read records from

  • Required

  • Type: data_source

Consensus Field(s) with High Values Preferred (consensus_fields_with_high_values_preferred): Integer and/or float fields on the input dataset to use in the consensus and for which high values are preferred (e.g., a value of 5 should be considered ‘better’ than a values of -1). Tanimoto similarities are an example of real world data in which high values (i.e., higher similarity) are preferred. Multiple fields can be specified for this parameter.

  • Type: string

Consensus Field(s) with Low Values Preferred (consensus_fields_with_low_values_preferred): Integer and/or float fields on the input dataset to use in the consensus and for which lower values are preferred (e.g., a value of -1 should be considered ‘better’ than a values of 5). Binding energies are an example of real world data in which lower values are typically preferred (i.e., the lower value the binding energy the better the binding). Multiple fields can be specified for this parameter.

  • Type: string

Outputs

Output Dataset (output_dataset): Name of the consensus output dataset

  • Required

  • Type: dataset_out

  • Default: Pareto Frontier Consensus

Options

Pareto Dominance Max Rank (pareto_dominance_max_rank): This is the maximum allowed Pareto Dominance Rank a record may have and still make it onto the consensus output. The Pareto Dominance Rank of a given record is the number of other records which have a better value for every one of the values used in the consensus (see ‘Consensus Field(s) with High Values Preferred’ and ‘Consensus Field(s) with Low Values Preferred’ input parameters).

  • Type: integer

  • Default: 4

Output Fields

Pareto Dominance Rank Field (pareto_dominance_rank_field): Integer Field on the output dataset holding the Pareto Dominance Rank of the record. A rank of 0 is the lowest and the best rank. The highest rank that will appear in the output is determined by the setting of the ‘Pareto Dominance Max Rank’ parameter.

  • Required

  • Type: field_parameter::int

  • Default: Pareto Dominance Rank