Focused Library - Molecule Input¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Hit to Lead/Generative Design/Reaction-based Libraries
Task-based/Library Prep & Design/Reaction-based Enumeration
Task-based/Virtual Screening - Structure-Based
Role-based/Medicinal Chemist
Description
This floe will apply reactions to the input lead molecule, generating an output dataset of products.
Required Inputs:
Both the Reaction & Reagent Database and an input lead molecule dataset are required.
Required Outputs:
The name of an output dataset should be specified, as the Output Data parameter is “On” by default. See the discussion of prospective runs below.
Optional Activities:
The Molecule ID Field should generally match the source of the input lead molecules for the Reaction & Reagent Database file. In the case of ZINC as the source, zinc_id is the standard structure ID field.
Enabling the DB Listing option will generate a reaction directory floe report from the input Reaction & Reagent Database. This is the same directory that the Reaction & Reagent Database - Directory Listing Floe provides.
A boolean (Filter Output) enables or disables the specific type of molecule filter selected by Mol Filter.
There is a small set of pre-selected properties, Compute Molecule Properties, that can be computed on the generated products, or this activity can be disabled by removing all the properties from the list.
For prospective and trial activities, the Output Data, Output Failures, and Output Specific Failures booleans, when set to “Off”, will provide counts of the outputs from the floe without creating dataset(s). This is useful for validating the input options against a specific input lead molecule dataset prior to running a capture run to generate output dataset(s).
The Check Valences and Strict Valences options control whether rejecting or fixing valence issues are allowed and/or whether any illegal valence in the product results in rejection from the output products.
The Strict Classification option controls whether lead molecules are classified according to both the required and disallowed chemical features (defined by the Reaction & Reagent Database reactions) or simply by the required features. Turning “Off” the strict option may generate alternate (or even surprising) products due to reactions at undesirable sites.
General Considerations¶
There are alternate ways to run this floe.
Allow the lead molecules to be automatically classified as to their reagent types.
Provide a specific reagent ID (or IDs, as a space-delimited list) as the reagent type for the lead molecule with validation.
Provide a specific reagent ID (or IDs) as the reagent type for the lead molecule without validation.
Approach #1: The reagent classifier from the provided Reaction & Reagent Database input is used to identify the reagent types for the lead molecules on the fly.
Approach #2: The user asserts that the provided reagent ID (or IDs) matches the chemistry of the lead molecule. The reagent classifier from the Reaction & Reagent Database is used to certify that assertion, and only the lead molecules that match the specified reagent chemistry ID(s) are sent downstream for processing as those reagents.
Approach #3: The user asserts that a provided reagent ID (or IDs) matches the chemistry of the lead molecule. No validation of the classification is attempted, and the lead molecules are used in the provided context of that specific reaction without restriction. If the provided ID(s) are incorrect, or the lead molecule does not correspond to the provided ID(s), a large number of reaction failures should generally be expected.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Lead Molecule Dataset (lead_molecule): A dataset containing the lead molecule(s) to be transformed by reactions from the reaction & reagent database. This dataset is assumed to be a dataset of ONE lead molecule due to the amplification of product(s) from the Floe, but the input limit can be altered in the [Advanced Focused Library Options] tab. Generally small input datasets are expected.
Required
Type: data_source
Reaction & Reagent Database (rxndb): The name of the reaction & reagent database to use
Required
Type: file_in
Outputs
Output Dataset (output): Output dataset containing generated products
Required
Type: dataset_out
Default: Reaction_products
Output Data (outdata): If OFF, just counts records, but does not output them
Required
Type: boolean
Default: True
Choices: [True, False]
General Failures (failures): Output dataset containing input failures and reagents that failed to react
Required
Type: dataset_out
Default: Input_failures
Output Failures (outfails): If OFF, just counts records, but does not output them
Required
Type: boolean
Default: False
Choices: [True, False]
Specific Product Failures (prodfailures): Output dataset containing specific reagent combinations that failed to react
Required
Type: dataset_out
Default: Product_failures
Output Specific Failures (outprodfails): If OFF, just counts records, but does not output them
Required
Type: boolean
Default: False
Choices: [True, False]
Focused Library Options
Reactions or Reagents (queryclass): A list of reactions and/or reagents for selection of transforms. If this is a list of reagents, the input molecules will be verified against this reagent type, or presumed to be this reagent type if the Verify Classifications switch is OFF.
Type: string
Default: []
Choices: [‘3-nitrile-pyridine’, ‘3-nitrile-pyridine:Diones_2_4’, ‘Buchwald-Hartwig’, ‘Buchwald-Hartwig:Amines’, ‘Buchwald-Hartwig:Halides_aryl’, ‘Buchwald_cross_coupling1’, ‘Buchwald_cross_coupling1:Amines’, ‘Buchwald_cross_coupling1:Aryl_halides’, ‘Buchwald_cross_coupling2’, ‘Buchwald_cross_coupling2:Amines’, ‘Buchwald_cross_coupling2:Aryl_halides’, ‘Ester_hydrolysis-amide_synthesis1’, ‘Ester_hydrolysis-amide_synthesis1:Amines’, ‘Ester_hydrolysis-amide_synthesis1:Esters’, ‘Ester_hydrolysis-amide_synthesis2’, ‘Ester_hydrolysis-amide_synthesis2:Amines’, ‘Ester_hydrolysis-amide_synthesis2:Esters’, ‘Grignard_alcohol’, ‘Grignard_alcohol:Halides_alkyl’, ‘Grignard_alcohol:Ketones_aldehydes’, ‘Grignard_carbonyl’, ‘Grignard_carbonyl:Halides_alkyl_aryl’, ‘Grignard_carbonyl:Nitriles’, ‘Heck_non-terminal_vinyl’, ‘Heck_non-terminal_vinyl:Halide_vinyl_aryls’, ‘Heck_non-terminal_vinyl:Non_terminal_vinyls’, ‘Heck_terminal_vinyl’, ‘Heck_terminal_vinyl:Halide_vinyl_aryls’, ‘Heck_terminal_vinyl:Terminal_vinyls’, ‘Huisgen_disubst-alkyne’, ‘Huisgen_disubst-alkyne:Alkyl_halides_alcohols’, ‘Huisgen_disubst-alkyne:Alkynes_disubstituted’, ‘Mitsunobu_imide’, ‘Mitsunobu_imide:Acetylacetamides’, ‘Mitsunobu_imide:Alcohols_primary_secondary’, ‘Mitsunobu_phenol’, ‘Mitsunobu_phenol:Alcohols_primary_secondary’, ‘Mitsunobu_phenol:Phenols’, ‘Mitsunobu_sulfonamide’, ‘Mitsunobu_sulfonamide:Alcohols_primary_secondary’, ‘Mitsunobu_sulfonamide:Sulfonamides’, ‘Mitsunobu_tetrazole_1’, ‘Mitsunobu_tetrazole_1:Alcohols_primary_secondary’, ‘Mitsunobu_tetrazole_1:Tetrazoles’, ‘Mitsunobu_tetrazole_2’, ‘Mitsunobu_tetrazole_2:Alcohols_primary_secondary’, ‘Mitsunobu_tetrazole_2:Tetrazoles’, ‘N-alkylation1’, ‘N-alkylation1:Amines’, ‘N-alkylation1:Benzyl_halides’, ‘N-alkylation2’, ‘N-alkylation2:Amines’, ‘N-alkylation2:Benzyl_halides’, ‘N-arylation_heterocycles’, ‘N-arylation_heterocycles:Boronic_acids_aryl’, ‘N-arylation_heterocycles:Pyrrole_like_nitrogens’, ‘Negishi’, ‘Negishi:Alkyl_halides_primary1’, ‘Negishi:Alkyl_halides_primary2’, ‘Niementowski_quinazoline’, ‘Niementowski_quinazoline:Amides_primary’, ‘Niementowski_quinazoline:Aminobenzoic_acids’, ‘O-alkylation’, ‘O-alkylation:Benzyl_halides’, ‘O-alkylation:Phenols’, ‘O-biarylation’, ‘O-biarylation:Aryl_bromides’, ‘O-biarylation:Phenols’, ‘Pictet-Spengler’, ‘Pictet-Spengler:Aldehydes’, ‘Pictet-Spengler:Beta_amino_benzenes’, ‘Reductive_amination1’, ‘Reductive_amination1:Aldehydes’, ‘Reductive_amination1:Amines’, ‘Reductive_amination2’, ‘Reductive_amination2:Aldehydes’, ‘Reductive_amination2:Amines’, ‘Schotten-Baumann_amide’, ‘Schotten-Baumann_amide:Amines’, ‘Schotten-Baumann_amide:Carboxylic_acids’, ‘SnAr1’, ‘SnAr1:Amines’, ‘SnAr1:Heterohalides’, ‘SnAr2’, ‘SnAr2:Amines’, ‘SnAr2:Heterohalides’, ‘Sonogashira’, ‘Sonogashira:Alkynes’, ‘Sonogashira:Bromo_iodo_vinyls_aryls’, ‘Stille’, ‘Stille:Bromo_iodo_vinyls_aryls’, ‘Stille:Halides_aryl’, ‘Suzuki_cross_coupling’, ‘Suzuki_cross_coupling:Aryl_bromides’, ‘Suzuki_cross_coupling:Suzuki_boronics’, ‘Wittig’, ‘Wittig:Alkyl_halides_primary’, ‘Wittig:Ketones_aldehydes’, ‘benzimidazole_derivatives_aldehyde’, ‘benzimidazole_derivatives_aldehyde:Aldehydes’, ‘benzimidazole_derivatives_aldehyde:Aro_6_diamines’, ‘benzimidazole_derivatives_carboxylic-acid/ester’, ‘benzimidazole_derivatives_carboxylic-acid/ester:Aro_6_diamines’, ‘benzimidazole_derivatives_carboxylic-acid/ester:Carboxylic_acids’, ‘benzofuran’, ‘benzofuran:Alkynes’, ‘benzofuran:Halophenols’, ‘benzothiazole’, ‘benzothiazole:Aldehydes’, ‘benzothiazole:Aro_6_thiamines’, ‘benzothiophene’, ‘benzothiophene:Alkynes’, ‘benzothiophene:Halomethiols’, ‘benzoxazole_arom-aldehyde’, ‘benzoxazole_arom-aldehyde:Aminophenols’, ‘benzoxazole_arom-aldehyde:Benzaldehydes’, ‘benzoxazole_carboxylic-acid’, ‘benzoxazole_carboxylic-acid:Aminophenols’, ‘benzoxazole_carboxylic-acid:Carboxylic_acids’, ‘decarboxylative_coupling’, ‘decarboxylative_coupling:Carbonyl_benzoic_acids’, ‘decarboxylative_coupling:Halides_aryl’, ‘heteroaromatic_nuc_sub’, ‘heteroaromatic_nuc_sub:Amines’, ‘heteroaromatic_nuc_sub:Halo_aryls_activated’, ‘imidazole’, ‘imidazole:Alpha_halo_ketones’, ‘imidazole:Aryl_amidines_guanidines’, ‘indole’, ‘indole:Alkynes’, ‘indole:Haloanilines’, ‘nucl_sub_aromatic_ortho_nitro’, ‘nucl_sub_aromatic_ortho_nitro:Amines’, ‘nucl_sub_aromatic_ortho_nitro:Ortho_nitro_halides’, ‘nucl_sub_aromatic_para_nitro’, ‘nucl_sub_aromatic_para_nitro:Amines’, ‘nucl_sub_aromatic_para_nitro:Para_nitro_halides’, ‘oxadiazole’, ‘oxadiazole:Carboxylic_acids’, ‘oxadiazole:Nitriles’, ‘phthalazinone’, ‘phthalazinone:Hydrazines’, ‘phthalazinone:Ketobenzoic_acids’, ‘piperidine_indole’, ‘piperidine_indole:Indoles’, ‘piperidine_indole:Piperidines’, ‘pyrazole’, ‘pyrazole:Diones_2_4’, ‘pyrazole:Hydrazines’, ‘spiro-chromanone’, ‘spiro-chromanone:Ketophenols’, ‘spiro-chromanone:Piperadone_ketones’, ‘sulfon_amide’, ‘sulfon_amide:Amines’, ‘sulfon_amide:Sulfonyl_chlorides’, ‘tetrazole_connect_regioisomere_1’, ‘tetrazole_connect_regioisomere_1:Alkyl_bromides’, ‘tetrazole_connect_regioisomere_1:Nitriles’, ‘tetrazole_connect_regioisomere_2’, ‘tetrazole_connect_regioisomere_2:Alkyl_bromides’, ‘tetrazole_connect_regioisomere_2:Nitriles’, ‘tetrazole_terminal’, ‘tetrazole_terminal:Nitriles’, ‘thiazole’, ‘thiazole:Alpha_halo_ketones’, ‘thiazole:Thioamides’, ‘triaryl-imidazole’, ‘triaryl-imidazole:Aro_ethane_diones’, ‘triaryl-imidazole:Aroaldehydes’, ‘urea’, ‘urea:Amines’, ‘urea:Isocyanates’]
Custom Reactions or Reagents (customqueryclass): A list of custom reactions and/or reagents for selection of transforms. If this is a list of reagents, the input molecules will be verified against this reagent type, or presumed to be this reagent type if the Verify Classifications switch if OFF. Any specification here supersedes any selection specified by the ‘Reactions or Reagents’ above.
Type: string
Reaction Applied (rxnid): Name of the string field to identify the reaction
Type: string
Default: ReactionId
Output Mol Field (outmol): Output molecule field
Type: field_parameter::mol
Annotate Mol (outmolsmi): Name of the string field for the input molecule SMILES or blank to suppress
Type: string
Default: OriginalMol
Annotate Rxn (outrxnsmi): Name of the string field for the reaction SMILES or blank to suppress
Type: string
Default: Reaction
Strict Valences (strictval): If On, only output products with valid valences
Type: boolean
Default: True
Choices: [True, False]
Check Valences (checkval): How to handle valence issues for the generated products
Required
Type: string
Default: reject
Choices: [‘reject’, ‘allow’, ‘fix’]
SMILES Dedupe (dedupesmi): If ON, performs a deduplication of the product smiles
Required
Type: boolean
Default: True
Choices: [True, False]
SMILES Dedupe Memory (dedupesmimem): Product deduplication may require significant memory resources, specify the desired amount in Mb
Type: decimal
Default: 10240
DB Listing (listing): Generate a directory listing of the input reaction & reagent database
Type: boolean
Default: False
Choices: [True, False]
Focused Library Filtering Options
Filter Output (filtering): Enable molecule filtering of the generated products (see type specified by [Mol Filter])
Required
Type: boolean
Default: True
Choices: [True, False]
Mol Filter (mol_filter_type): Type of molecule filter to apply to the generated analogs
Type: string
Default: BlockBuster
Choices: [‘Lead’, ‘Drug’, ‘BlockBuster’, ‘BlockBuster+PAINS’, ‘PAINS’, ‘Custom’]
Custom Filter File (mol_filter_file): A custom filter file resource to load
Type: file_in
Mol Filter Summary Report (filter_summary): if ON, will generate a summary report of the rules that filtered molecules
Type: boolean
Default: False
Choices: [True, False]
Focused Library Property Generation
Compute Molecule Properties (mol_props): Which molecule properties to calculate
Type: string
Default: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]
Choices: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]
Reagent Processing Options
Maximum Reagents (maxreagents): Maximum number of reagents to process
Type: integer
Default: 100
Allow Functional Group Conversions (funcgroups): If ON, allows functional group translations for reagents in the database.
Type: boolean
Default: True
Choices: [True, False]
Sample Reagents (samplereagents): Sample this percentage of the total reagent available, limited by the (optional) [Maximum Reagents] total
Type: integer
Advanced Focused Library Options
Lead Molecule Minimum Records (rec_min): The minimum number of lead molecule records allowed (default:1) Input lead molecule datasets that do not meet this threshold will terminate the floe, use 0 to suppress validation
Type: integer
Default: 1
Lead Molecule Maximum Records (rec_max): The maximum number of lead molecule records allowed (default:1) Input lead molecule datasets that exceed this threshold will terminate the floe, use 0 to suppress validation
Type: integer
Default: 1
Deprotecting Group Definitions (queryfngroupdefs): Optional name of a file resource containing reaction definitions to provide the deprotecting group transformation(s) of interest. If unspecified, the transformations from the database will be used.
Type: file_in
Deprotecting Groups (queryfngroups): A selectable list of specific deprotecting transformations from [Deprotecting Group Definitions] or from the database to apply to the lead molecule(s). If [Deprotecting Group Definitions] is unspecified, the database transformations are assumed to provide the functional group and/or deprotecting group transforms.
Type: string
Default: []
Choices: [‘All’, ‘Acid_Reduction’, ‘Alcohol_FullOxidation’, ‘Alcohol_PartialOxidation’, ‘Aldehyde_Reduction’, ‘AlkylHalide2Alcohol’, ‘AlkylHalide2Azide’, ‘AlkylHalide2Thiol’, ‘Amide_Reduction’, ‘Aniline_Diazotization’, ‘ArylHalide_Amination’, ‘ArylHalide_Borylation’, ‘Azide2Amine’, ‘Diazo2Cyano’, ‘Diazo2Halide’, ‘Diazo2Phenol’, ‘Ester_FullReduction’, ‘Ester_Hydrolysis’, ‘Ester_PartialReduction’, ‘Nitrile2Amide’, ‘Nitrile_Hydrolysis’, ‘Nitro_Reduction’]
Custom Deprotecting Groups (customqueryfngroups): A blank-delimited list of specific deprotecting transformations from [Deprotecting Group Definitions] or from the database to apply to the lead molecule(s). If [Deprotecting Group Definitions] is unspecified, the database transformations are assumed to provide the functional group and/or deprotecting group transforms. This specification always supersedes any list selections from [Deprotecting Groups]
Type: string
Verify Classifications (verifyclass): If ON, each input molecule will be classified as to its reagent type and verified against the requested list of reactions or reagents. If OFF, and the requested list of reactions or reagents is not empty the specified reagent types will be assumed without verification
Type: boolean
Default: True
Choices: [True, False]
Classifier Memory Limit (classifiermem): The memory limit for the reaction classifier - may need to be increased for large R&R databases
Required
Type: decimal
Default: 10240
Strict Classification (strictfeatures): If ON, all allowed and disallowed chemistry features are validated for classification of the input structures. If OFF, only required features are considered for reagent classifications
Type: boolean
Default: True
Choices: [True, False]
Molecule ID Field (molid): Name of the string field for the molecule id
Type: field_parameter::string
Molecule SMILES Field (molsmi): Name of the string field for the molecule SMILES
Type: field_parameter::string
Retain Input Dataset Fields (keepfields): If ON copies the input datarecord, if OFF, discards all but the structure (which will change) and sends it downstream for processing
Type: boolean
Default: False
Choices: [True, False]
Verbosity (verbosity): Sets the output logging verbosity
Type: string
Default: warning
Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]