Focused Library - Molecule Input

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Hit to Lead/Generative Design/Reaction-based Libraries

  • Task-based/Library Prep & Design/Reaction-based Enumeration

  • Task-based/Virtual Screening - Structure-Based

  • Role-based/Medicinal Chemist

Description

This floe will apply reactions to the input lead molecule, generating an output dataset of products.

Required Inputs:

Both the Reaction & Reagent Database and an input lead molecule dataset are required.

Required Outputs:

The name of an output dataset should be specified, as the Output Data parameter is “On” by default. See the discussion of prospective runs below.

Optional Activities:

The Molecule ID Field should generally match the source of the input lead molecules for the Reaction & Reagent Database file. In the case of ZINC as the source, zinc_id is the standard structure ID field.

Enabling the DB Listing option will generate a reaction directory floe report from the input Reaction & Reagent Database. This is the same directory that the Reaction & Reagent Database - Directory Listing Floe provides.

A boolean (Filter Output) enables or disables the specific type of molecule filter selected by Mol Filter.

There is a small set of pre-selected properties, Compute Molecule Properties, that can be computed on the generated products, or this activity can be disabled by removing all the properties from the list.

For prospective and trial activities, the Output Data, Output Failures, and Output Specific Failures booleans, when set to “Off”, will provide counts of the outputs from the floe without creating dataset(s). This is useful for validating the input options against a specific input lead molecule dataset prior to running a capture run to generate output dataset(s).

The Check Valences and Strict Valences options control whether rejecting or fixing valence issues are allowed and/or whether any illegal valence in the product results in rejection from the output products.

The Strict Classification option controls whether lead molecules are classified according to both the required and disallowed chemical features (defined by the Reaction & Reagent Database reactions) or simply by the required features. Turning “Off” the strict option may generate alternate (or even surprising) products due to reactions at undesirable sites.

General Considerations

There are alternate ways to run this floe.

  1. Allow the lead molecules to be automatically classified as to their reagent types.

  2. Provide a specific reagent ID (or IDs, as a space-delimited list) as the reagent type for the lead molecule with validation.

  3. Provide a specific reagent ID (or IDs) as the reagent type for the lead molecule without validation.

Approach #1: The reagent classifier from the provided Reaction & Reagent Database input is used to identify the reagent types for the lead molecules on the fly.

Approach #2: The user asserts that the provided reagent ID (or IDs) matches the chemistry of the lead molecule. The reagent classifier from the Reaction & Reagent Database is used to certify that assertion, and only the lead molecules that match the specified reagent chemistry ID(s) are sent downstream for processing as those reagents.

Approach #3: The user asserts that a provided reagent ID (or IDs) matches the chemistry of the lead molecule. No validation of the classification is attempted, and the lead molecules are used in the provided context of that specific reaction without restriction. If the provided ID(s) are incorrect, or the lead molecule does not correspond to the provided ID(s), a large number of reaction failures should generally be expected.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Lead Molecule Dataset (lead_molecule): A dataset containing the lead molecule(s) to be transformed by reactions from the Reaction & Reagent Database. This dataset is assumed to be a dataset of ONE lead molecule due to the amplification of product(s) from the floe, but the input limit can be altered in the Advanced Focused Library Options tab. Generally small input datasets are expected.

  • Required

  • Type: data_source

Reaction & Reagent Database (rxndb): The name of the Reaction & Reagent Database to use.

  • Required

  • Type: file_in

Outputs

Output Dataset (output): Output dataset containing generated products.

  • Required

  • Type: dataset_out

  • Default: Reaction_products

Output Data (outdata): If OFF, just counts records, but does not output them.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

General Failures (failures): Output dataset containing input failures and reagents that failed to react.

  • Required

  • Type: dataset_out

  • Default: Input_failures

Output Failures (outfails): If OFF, just counts records, but does not output them.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Specific Product Failures (prodfailures): Output dataset containing specific reagent combinations that failed to react.

  • Required

  • Type: dataset_out

  • Default: Product_failures

Output Specific Failures (outprodfails): If OFF, just counts records, but does not output them.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Focused Library Options

Reactions or Reagents (queryclass): A list of reactions and/or reagents for selection of transforms. If this is a list of reagents, the input molecules will be verified against this reagent type or presumed to be this reagent type if the Verify Classifications switch is OFF.

  • Type: string

  • Default: []

  • Choices: [‘3-nitrile-pyridine’, ‘3-nitrile-pyridine:Diones_2_4’, ‘Buchwald-Hartwig’, ‘Buchwald-Hartwig:Amines’, ‘Buchwald-Hartwig:Halides_aryl’, ‘Buchwald_cross_coupling1’, ‘Buchwald_cross_coupling1:Amines’, ‘Buchwald_cross_coupling1:Aryl_halides’, ‘Buchwald_cross_coupling2’, ‘Buchwald_cross_coupling2:Amines’, ‘Buchwald_cross_coupling2:Aryl_halides’, ‘Ester_hydrolysis-amide_synthesis1’, ‘Ester_hydrolysis-amide_synthesis1:Amines’, ‘Ester_hydrolysis-amide_synthesis1:Esters’, ‘Ester_hydrolysis-amide_synthesis2’, ‘Ester_hydrolysis-amide_synthesis2:Amines’, ‘Ester_hydrolysis-amide_synthesis2:Esters’, ‘Grignard_alcohol’, ‘Grignard_alcohol:Halides_alkyl’, ‘Grignard_alcohol:Ketones_aldehydes’, ‘Grignard_carbonyl’, ‘Grignard_carbonyl:Halides_alkyl_aryl’, ‘Grignard_carbonyl:Nitriles’, ‘Heck_non-terminal_vinyl’, ‘Heck_non-terminal_vinyl:Halide_vinyl_aryls’, ‘Heck_non-terminal_vinyl:Non_terminal_vinyls’, ‘Heck_terminal_vinyl’, ‘Heck_terminal_vinyl:Halide_vinyl_aryls’, ‘Heck_terminal_vinyl:Terminal_vinyls’, ‘Huisgen_disubst-alkyne’, ‘Huisgen_disubst-alkyne:Alkyl_halides_alcohols’, ‘Huisgen_disubst-alkyne:Alkynes_disubstituted’, ‘Mitsunobu_imide’, ‘Mitsunobu_imide:Acetylacetamides’, ‘Mitsunobu_imide:Alcohols_primary_secondary’, ‘Mitsunobu_phenol’, ‘Mitsunobu_phenol:Alcohols_primary_secondary’, ‘Mitsunobu_phenol:Phenols’, ‘Mitsunobu_sulfonamide’, ‘Mitsunobu_sulfonamide:Alcohols_primary_secondary’, ‘Mitsunobu_sulfonamide:Sulfonamides’, ‘Mitsunobu_tetrazole_1’, ‘Mitsunobu_tetrazole_1:Alcohols_primary_secondary’, ‘Mitsunobu_tetrazole_1:Tetrazoles’, ‘Mitsunobu_tetrazole_2’, ‘Mitsunobu_tetrazole_2:Alcohols_primary_secondary’, ‘Mitsunobu_tetrazole_2:Tetrazoles’, ‘N-alkylation1’, ‘N-alkylation1:Amines’, ‘N-alkylation1:Benzyl_halides’, ‘N-alkylation2’, ‘N-alkylation2:Amines’, ‘N-alkylation2:Benzyl_halides’, ‘N-arylation_heterocycles’, ‘N-arylation_heterocycles:Boronic_acids_aryl’, ‘N-arylation_heterocycles:Pyrrole_like_nitrogens’, ‘Negishi’, ‘Negishi:Alkyl_halides_primary1’, ‘Negishi:Alkyl_halides_primary2’, ‘Niementowski_quinazoline’, ‘Niementowski_quinazoline:Amides_primary’, ‘Niementowski_quinazoline:Aminobenzoic_acids’, ‘O-alkylation’, ‘O-alkylation:Benzyl_halides’, ‘O-alkylation:Phenols’, ‘O-biarylation’, ‘O-biarylation:Aryl_bromides’, ‘O-biarylation:Phenols’, ‘Pictet-Spengler’, ‘Pictet-Spengler:Aldehydes’, ‘Pictet-Spengler:Beta_amino_benzenes’, ‘Reductive_amination1’, ‘Reductive_amination1:Aldehydes’, ‘Reductive_amination1:Amines’, ‘Reductive_amination2’, ‘Reductive_amination2:Aldehydes’, ‘Reductive_amination2:Amines’, ‘Schotten-Baumann_amide’, ‘Schotten-Baumann_amide:Amines’, ‘Schotten-Baumann_amide:Carboxylic_acids’, ‘SnAr1’, ‘SnAr1:Amines’, ‘SnAr1:Heterohalides’, ‘SnAr2’, ‘SnAr2:Amines’, ‘SnAr2:Heterohalides’, ‘Sonogashira’, ‘Sonogashira:Alkynes’, ‘Sonogashira:Bromo_iodo_vinyls_aryls’, ‘Stille’, ‘Stille:Bromo_iodo_vinyls_aryls’, ‘Stille:Halides_aryl’, ‘Suzuki_cross_coupling’, ‘Suzuki_cross_coupling:Aryl_bromides’, ‘Suzuki_cross_coupling:Suzuki_boronics’, ‘Wittig’, ‘Wittig:Alkyl_halides_primary’, ‘Wittig:Ketones_aldehydes’, ‘benzimidazole_derivatives_aldehyde’, ‘benzimidazole_derivatives_aldehyde:Aldehydes’, ‘benzimidazole_derivatives_aldehyde:Aro_6_diamines’, ‘benzimidazole_derivatives_carboxylic-acid/ester’, ‘benzimidazole_derivatives_carboxylic-acid/ester:Aro_6_diamines’, ‘benzimidazole_derivatives_carboxylic-acid/ester:Carboxylic_acids’, ‘benzofuran’, ‘benzofuran:Alkynes’, ‘benzofuran:Halophenols’, ‘benzothiazole’, ‘benzothiazole:Aldehydes’, ‘benzothiazole:Aro_6_thiamines’, ‘benzothiophene’, ‘benzothiophene:Alkynes’, ‘benzothiophene:Halomethiols’, ‘benzoxazole_arom-aldehyde’, ‘benzoxazole_arom-aldehyde:Aminophenols’, ‘benzoxazole_arom-aldehyde:Benzaldehydes’, ‘benzoxazole_carboxylic-acid’, ‘benzoxazole_carboxylic-acid:Aminophenols’, ‘benzoxazole_carboxylic-acid:Carboxylic_acids’, ‘decarboxylative_coupling’, ‘decarboxylative_coupling:Carbonyl_benzoic_acids’, ‘decarboxylative_coupling:Halides_aryl’, ‘heteroaromatic_nuc_sub’, ‘heteroaromatic_nuc_sub:Amines’, ‘heteroaromatic_nuc_sub:Halo_aryls_activated’, ‘imidazole’, ‘imidazole:Alpha_halo_ketones’, ‘imidazole:Aryl_amidines_guanidines’, ‘indole’, ‘indole:Alkynes’, ‘indole:Haloanilines’, ‘nucl_sub_aromatic_ortho_nitro’, ‘nucl_sub_aromatic_ortho_nitro:Amines’, ‘nucl_sub_aromatic_ortho_nitro:Ortho_nitro_halides’, ‘nucl_sub_aromatic_para_nitro’, ‘nucl_sub_aromatic_para_nitro:Amines’, ‘nucl_sub_aromatic_para_nitro:Para_nitro_halides’, ‘oxadiazole’, ‘oxadiazole:Carboxylic_acids’, ‘oxadiazole:Nitriles’, ‘phthalazinone’, ‘phthalazinone:Hydrazines’, ‘phthalazinone:Ketobenzoic_acids’, ‘piperidine_indole’, ‘piperidine_indole:Indoles’, ‘piperidine_indole:Piperidines’, ‘pyrazole’, ‘pyrazole:Diones_2_4’, ‘pyrazole:Hydrazines’, ‘spiro-chromanone’, ‘spiro-chromanone:Ketophenols’, ‘spiro-chromanone:Piperadone_ketones’, ‘sulfon_amide’, ‘sulfon_amide:Amines’, ‘sulfon_amide:Sulfonyl_chlorides’, ‘tetrazole_connect_regioisomere_1’, ‘tetrazole_connect_regioisomere_1:Alkyl_bromides’, ‘tetrazole_connect_regioisomere_1:Nitriles’, ‘tetrazole_connect_regioisomere_2’, ‘tetrazole_connect_regioisomere_2:Alkyl_bromides’, ‘tetrazole_connect_regioisomere_2:Nitriles’, ‘tetrazole_terminal’, ‘tetrazole_terminal:Nitriles’, ‘thiazole’, ‘thiazole:Alpha_halo_ketones’, ‘thiazole:Thioamides’, ‘triaryl-imidazole’, ‘triaryl-imidazole:Aro_ethane_diones’, ‘triaryl-imidazole:Aroaldehydes’, ‘urea’, ‘urea:Amines’, ‘urea:Isocyanates’]

Custom Reactions or Reagents (customqueryclass): A list of custom reactions and/or reagents for selection of transforms. If this is a list of reagents, the input molecules will be verified against this reagent type or presumed to be this reagent type if the Verify Classifications switch if OFF. Any specification here supersedes any selection specified by the Reactions or Reagents above.

  • Type: string

Reaction Applied (rxnid): Name of the string field to identify the reaction.

  • Type: string

  • Default: ReactionId

Output Mol Field (outmol): Output molecule field.

  • Type: field_parameter::mol

Annotate Mol (outmolsmi): Name of the string field for the input molecule SMILES or blank to suppress.

  • Type: string

  • Default: OriginalMol

Annotate Rxn (outrxnsmi): Name of the string field for the reaction SMILES or blank to suppress.

  • Type: string

  • Default: Reaction

Strict Valences (strictval): If On, only output products with valid valences.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Check Valences (checkval): How to handle valence issues for the generated products.

  • Required

  • Type: string

  • Default: ignore

  • Choices: [‘ignore’, ‘fix’, ‘reject’]

SMILES Dedupe (dedupesmi): If ON, performs a deduplication of the product SMILES.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

SMILES Dedupe Memory (dedupesmimem): Product deduplication may require significant memory resources, specify the desired amount in MB.

  • Type: decimal

  • Default: 10240

DB Listing (listing): Generate a directory listing of the input Reaction & Reagent Database.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Focused Library Filtering Options

Filter Output (filtering): Enable molecule filtering of the generated products (see type specified by Mol Filter).

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Mol Filter (mol_filter_type): Type of molecule filter to apply to the generated analogs.

  • Type: string

  • Default: BlockBuster

  • Choices: [‘Lead’, ‘Drug’, ‘BlockBuster’, ‘BlockBuster+PAINS’, ‘PAINS’, ‘Custom’]

Custom Filter File (mol_filter_file): A custom filter file resource to load.

  • Type: file_in

Mol Filter Summary Report (filter_summary): If ON, will generate a summary report of the rules that filtered molecules.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Focused Library Property Generation

Compute Molecule Properties (mol_props): Which molecule properties to calculate.

  • Type: string

  • Default: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]

  • Choices: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]

Reagent Processing Options

Maximum Reagents (maxreagents): Maximum number of reagents to process.

  • Type: integer

  • Default: 100

Allow Functional Group Conversions (funcgroups): If ON, allows functional group translations for reagents in the database.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Sample Reagents (samplereagents): Sample this percentage of the total reagent available, limited by the (optional) Maximum Reagents total.

  • Type: integer

Advanced Focused Library Options

Lead Molecule Minimum Records (rec_min): The minimum number of lead molecule records allowed (default:1). Input lead molecule datasets that do not meet this threshold will terminate the floe. Use 0 to suppress validation.

  • Type: integer

  • Default: 1

Lead Molecule Maximum Records (rec_max): The maximum number of lead molecule records allowed (default:1). Input lead molecule datasets that exceed this threshold will terminate the floe. Use 0 to suppress validation.

  • Type: integer

  • Default: 1

Deprotecting Group Definitions (queryfngroupdefs): Optional name of a file resource containing reaction definitions to provide the deprotecting group transformation(s) of interest. If unspecified, the transformations from the database will be used.

  • Type: file_in

Deprotecting Groups (queryfngroups): A selectable list of specific deprotecting transformations from Deprotecting Group Definitions or from the database to apply to the lead molecule(s). If Deprotecting Group Definitions is unspecified, the database transformations are assumed to provide the functional group and/or deprotecting group transforms.

  • Type: string

  • Default: []

  • Choices: [‘All’, ‘Acid_Reduction’, ‘Alcohol_FullOxidation’, ‘Alcohol_PartialOxidation’, ‘Aldehyde_Reduction’, ‘AlkylHalide2Alcohol’, ‘AlkylHalide2Azide’, ‘AlkylHalide2Thiol’, ‘Amide_Reduction’, ‘Aniline_Diazotization’, ‘ArylHalide_Amination’, ‘ArylHalide_Borylation’, ‘Azide2Amine’, ‘Diazo2Cyano’, ‘Diazo2Halide’, ‘Diazo2Phenol’, ‘Ester_FullReduction’, ‘Ester_Hydrolysis’, ‘Ester_PartialReduction’, ‘Nitrile2Amide’, ‘Nitrile_Hydrolysis’, ‘Nitro_Reduction’]

Custom Deprotecting Groups (customqueryfngroups): A blank-delimited list of specific deprotecting transformations from Deprotecting Group Definitions or from the database to apply to the lead molecule(s). If Deprotecting Group Definitions is unspecified, the database transformations are assumed to provide the functional group and/or deprotecting group transforms. This specification always supersedes any list selections from Deprotecting Groups.

  • Type: string

Verify Classifications (verifyclass): If ON, each input molecule will be classified as to its reagent type and verified against the requested list of reactions or reagents. If OFF, and the requested list of reactions or reagents is not empty, the specified reagent types will be assumed without verification.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Classifier Memory Limit (classifiermem): The memory limit for the reaction classifier. It may need to be increased for large R&R Databases.

  • Required

  • Type: decimal

  • Default: 10240

Strict Classification (strictfeatures): If ON, all allowed and disallowed chemistry features are validated for classification of the input structures. If OFF, only required features are considered for reagent classifications.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Molecule ID Field (molid): Name of the string field for the molecule ID.

  • Type: field_parameter::string

Molecule SMILES Field (molsmi): Name of the string field for the molecule SMILES.

  • Type: field_parameter::string

Retain Input Dataset Fields (keepfields): If ON, it copies the input data record. If OFF, it discards all but the structure (which will change) and sends it downstream for processing.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Verbosity (verbosity): Sets the output logging verbosity.

  • Type: string

  • Default: warning

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]