Focused Library - Synthon Analogs

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Hit to Lead/Generative Design/Reaction-based Libraries

  • Task-based/Library Prep & Design/Reaction-based Enumeration

  • Task-based/Virtual Screening - Structure-Based

  • Role-based/Medicinal Chemist

Description

This floe performs a single-step retro-synthetic analysis of the input lead molecule(s) and applies the corresponding reaction transformations to generate analog libraries. All applied transforms are provided in the Reaction & Reagent Database.

Required Inputs:

Both the Reaction & Reagent Database and an input lead molecule dataset are required. Sample databases are available as File resources in the Organization Data/OpenEye Data/Generative Design Data folder as 2022_2_ZINC_5K_lowcomplexity.db and 2022_2_ZINC_5K_highinterest.db. The former samples ZINC reagents (for each reagent class in the database) with low molecular complexity values, while the latter contains ZINC reagents of high medchem interest scores.

Required Outputs:

The name of an output dataset should be specified, as the Output Data parameter is “On” by default. See the discussion of prospective runs below.

Optional Activities:

The Molecule ID Field should generally match the source of the input lead molecules for the Reaction & Reagent Database file. In the case of ZINC as the source, zinc_id is the standard structure ID field.

There is a small set of preselected properties, Compute Molecule Properties, that can be computed on the generated products, or this activity can be disabled by removing all the properties from the list.

For prospective and trial activities, the Output Data, Output Failures, and Output Specific Failures booleans, when set to “Off”, will provide counts of the outputs from the floe without creating dataset(s). This is useful for validating the input options against a specific input lead molecule dataset prior running a capture run to generate output dataset(s).

The Check Valences and Strict Valences options control whether rejecting or fixing of valence issues are allowed and/or whether any illegal valence in the product results in rejection from the output products.

The Strict Classification option controls whether lead molecules are classified according to both the required and disallowed chemical features (defined by the Reaction & Reagent Database reactions) or simply by the required features. Turning “Off” the strict option may generate alternate (or even surprising) products due to reactions at undesirable sites.

The Fragmentation Size option adds a constraint to the size of the reagents generated from the retro reaction transformation(s) application, where a smaller value allows smaller reagents and a larger value requires larger reagents, specified as a heavy-atom percentage of the input molecules.

General Considerations

If specific reagents or reactions are specified, the analysis of the input lead molecule will be restricted to those reactions.

If one or more reagent classes are specified and the retro-synthetic analysis of the input molecule is productive for that reaction, the unspecified reagent of the reaction is kept fixed, and the specified reagent is varied based on sampled reagents from the Reaction & Reagent Database.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Lead Molecule Dataset (lead_molecule): A dataset containing the lead molecule(s) to be transformed by reactions from the reaction & reagent database. This dataset is assumed to be a dataset of ONE lead molecule due to the amplification of product(s) from the Floe, but the input limit can be altered in the [Advanced Focused Library Options] tab. Generally small input datasets are expected.

  • Required

  • Type: data_source

Reaction & Reagent Database (rxndb): The name of the reaction & reagent database to use. Sample databases are available as File resources in the ‘Organization Data/OpenEye Data/Generative Design Data’ folder

  • Required

  • Type: file_in

Outputs

Output Dataset (output): Output dataset containing generated products

  • Required

  • Type: dataset_out

  • Default: Reaction_products

Output Data (outdata): If OFF, just counts records, but does not output them

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

General Failures (failures): Output dataset containing input failures and reagents that failed to react

  • Required

  • Type: dataset_out

  • Default: Input_failures

Output Failures (outfails): If OFF, just counts records, but does not output them

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Specific Product Failures (prodfailures): Output dataset containing specific reagent combinations that failed to react

  • Required

  • Type: dataset_out

  • Default: Product_failures

Output Specific Failures (outprodfails): If OFF, just counts records, but does not output them

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Focused Library Options

Reactions or Reagents (queryclass): A list of reactions and/or reagents for selection of transforms. If this is a list of reagents, the input molecules will be verified against this reagent type, or presumed to be this reagent type if the Verify Classifications switch is OFF.

  • Type: string

  • Default: []

  • Choices: [‘3-nitrile-pyridine’, ‘3-nitrile-pyridine:Diones_2_4’, ‘Buchwald-Hartwig’, ‘Buchwald-Hartwig:Amines’, ‘Buchwald-Hartwig:Halides_aryl’, ‘Buchwald_cross_coupling1’, ‘Buchwald_cross_coupling1:Amines’, ‘Buchwald_cross_coupling1:Aryl_halides’, ‘Buchwald_cross_coupling2’, ‘Buchwald_cross_coupling2:Amines’, ‘Buchwald_cross_coupling2:Aryl_halides’, ‘Ester_hydrolysis-amide_synthesis1’, ‘Ester_hydrolysis-amide_synthesis1:Amines’, ‘Ester_hydrolysis-amide_synthesis1:Esters’, ‘Ester_hydrolysis-amide_synthesis2’, ‘Ester_hydrolysis-amide_synthesis2:Amines’, ‘Ester_hydrolysis-amide_synthesis2:Esters’, ‘Grignard_alcohol’, ‘Grignard_alcohol:Halides_alkyl’, ‘Grignard_alcohol:Ketones_aldehydes’, ‘Grignard_carbonyl’, ‘Grignard_carbonyl:Halides_alkyl_aryl’, ‘Grignard_carbonyl:Nitriles’, ‘Heck_non-terminal_vinyl’, ‘Heck_non-terminal_vinyl:Halide_vinyl_aryls’, ‘Heck_non-terminal_vinyl:Non_terminal_vinyls’, ‘Heck_terminal_vinyl’, ‘Heck_terminal_vinyl:Halide_vinyl_aryls’, ‘Heck_terminal_vinyl:Terminal_vinyls’, ‘Huisgen_disubst-alkyne’, ‘Huisgen_disubst-alkyne:Alkyl_halides_alcohols’, ‘Huisgen_disubst-alkyne:Alkynes_disubstituted’, ‘Mitsunobu_imide’, ‘Mitsunobu_imide:Acetylacetamides’, ‘Mitsunobu_imide:Alcohols_primary_secondary’, ‘Mitsunobu_phenol’, ‘Mitsunobu_phenol:Alcohols_primary_secondary’, ‘Mitsunobu_phenol:Phenols’, ‘Mitsunobu_sulfonamide’, ‘Mitsunobu_sulfonamide:Alcohols_primary_secondary’, ‘Mitsunobu_sulfonamide:Sulfonamides’, ‘Mitsunobu_tetrazole_1’, ‘Mitsunobu_tetrazole_1:Alcohols_primary_secondary’, ‘Mitsunobu_tetrazole_1:Tetrazoles’, ‘Mitsunobu_tetrazole_2’, ‘Mitsunobu_tetrazole_2:Alcohols_primary_secondary’, ‘Mitsunobu_tetrazole_2:Tetrazoles’, ‘N-alkylation1’, ‘N-alkylation1:Amines’, ‘N-alkylation1:Benzyl_halides’, ‘N-alkylation2’, ‘N-alkylation2:Amines’, ‘N-alkylation2:Benzyl_halides’, ‘N-arylation_heterocycles’, ‘N-arylation_heterocycles:Boronic_acids_aryl’, ‘N-arylation_heterocycles:Pyrrole_like_nitrogens’, ‘Negishi’, ‘Negishi:Alkyl_halides_primary1’, ‘Negishi:Alkyl_halides_primary2’, ‘Niementowski_quinazoline’, ‘Niementowski_quinazoline:Amides_primary’, ‘Niementowski_quinazoline:Aminobenzoic_acids’, ‘O-alkylation’, ‘O-alkylation:Benzyl_halides’, ‘O-alkylation:Phenols’, ‘O-biarylation’, ‘O-biarylation:Aryl_bromides’, ‘O-biarylation:Phenols’, ‘Pictet-Spengler’, ‘Pictet-Spengler:Aldehydes’, ‘Pictet-Spengler:Beta_amino_benzenes’, ‘Reductive_amination1’, ‘Reductive_amination1:Aldehydes’, ‘Reductive_amination1:Amines’, ‘Reductive_amination2’, ‘Reductive_amination2:Aldehydes’, ‘Reductive_amination2:Amines’, ‘Schotten-Baumann_amide’, ‘Schotten-Baumann_amide:Amines’, ‘Schotten-Baumann_amide:Carboxylic_acids’, ‘SnAr1’, ‘SnAr1:Amines’, ‘SnAr1:Heterohalides’, ‘SnAr2’, ‘SnAr2:Amines’, ‘SnAr2:Heterohalides’, ‘Sonogashira’, ‘Sonogashira:Alkynes’, ‘Sonogashira:Bromo_iodo_vinyls_aryls’, ‘Stille’, ‘Stille:Bromo_iodo_vinyls_aryls’, ‘Stille:Halides_aryl’, ‘Suzuki_cross_coupling’, ‘Suzuki_cross_coupling:Aryl_bromides’, ‘Suzuki_cross_coupling:Suzuki_boronics’, ‘Wittig’, ‘Wittig:Alkyl_halides_primary’, ‘Wittig:Ketones_aldehydes’, ‘benzimidazole_derivatives_aldehyde’, ‘benzimidazole_derivatives_aldehyde:Aldehydes’, ‘benzimidazole_derivatives_aldehyde:Aro_6_diamines’, ‘benzimidazole_derivatives_carboxylic-acid/ester’, ‘benzimidazole_derivatives_carboxylic-acid/ester:Aro_6_diamines’, ‘benzimidazole_derivatives_carboxylic-acid/ester:Carboxylic_acids’, ‘benzofuran’, ‘benzofuran:Alkynes’, ‘benzofuran:Halophenols’, ‘benzothiazole’, ‘benzothiazole:Aldehydes’, ‘benzothiazole:Aro_6_thiamines’, ‘benzothiophene’, ‘benzothiophene:Alkynes’, ‘benzothiophene:Halomethiols’, ‘benzoxazole_arom-aldehyde’, ‘benzoxazole_arom-aldehyde:Aminophenols’, ‘benzoxazole_arom-aldehyde:Benzaldehydes’, ‘benzoxazole_carboxylic-acid’, ‘benzoxazole_carboxylic-acid:Aminophenols’, ‘benzoxazole_carboxylic-acid:Carboxylic_acids’, ‘decarboxylative_coupling’, ‘decarboxylative_coupling:Carbonyl_benzoic_acids’, ‘decarboxylative_coupling:Halides_aryl’, ‘heteroaromatic_nuc_sub’, ‘heteroaromatic_nuc_sub:Amines’, ‘heteroaromatic_nuc_sub:Halo_aryls_activated’, ‘imidazole’, ‘imidazole:Alpha_halo_ketones’, ‘imidazole:Aryl_amidines_guanidines’, ‘indole’, ‘indole:Alkynes’, ‘indole:Haloanilines’, ‘nucl_sub_aromatic_ortho_nitro’, ‘nucl_sub_aromatic_ortho_nitro:Amines’, ‘nucl_sub_aromatic_ortho_nitro:Ortho_nitro_halides’, ‘nucl_sub_aromatic_para_nitro’, ‘nucl_sub_aromatic_para_nitro:Amines’, ‘nucl_sub_aromatic_para_nitro:Para_nitro_halides’, ‘oxadiazole’, ‘oxadiazole:Carboxylic_acids’, ‘oxadiazole:Nitriles’, ‘phthalazinone’, ‘phthalazinone:Hydrazines’, ‘phthalazinone:Ketobenzoic_acids’, ‘piperidine_indole’, ‘piperidine_indole:Indoles’, ‘piperidine_indole:Piperidines’, ‘pyrazole’, ‘pyrazole:Diones_2_4’, ‘pyrazole:Hydrazines’, ‘spiro-chromanone’, ‘spiro-chromanone:Ketophenols’, ‘spiro-chromanone:Piperadone_ketones’, ‘sulfon_amide’, ‘sulfon_amide:Amines’, ‘sulfon_amide:Sulfonyl_chlorides’, ‘tetrazole_connect_regioisomere_1’, ‘tetrazole_connect_regioisomere_1:Alkyl_bromides’, ‘tetrazole_connect_regioisomere_1:Nitriles’, ‘tetrazole_connect_regioisomere_2’, ‘tetrazole_connect_regioisomere_2:Alkyl_bromides’, ‘tetrazole_connect_regioisomere_2:Nitriles’, ‘tetrazole_terminal’, ‘tetrazole_terminal:Nitriles’, ‘thiazole’, ‘thiazole:Alpha_halo_ketones’, ‘thiazole:Thioamides’, ‘triaryl-imidazole’, ‘triaryl-imidazole:Aro_ethane_diones’, ‘triaryl-imidazole:Aroaldehydes’, ‘urea’, ‘urea:Amines’, ‘urea:Isocyanates’]

Custom Reactions or Reagents (customqueryclass): A list of custom reactions and/or reagents for selection of transforms. If this is a list of reagents, the input molecules will be verified against this reagent type, or presumed to be this reagent type if the Verify Classifications switch is OFF. Any specification here supersedes any selection specified by the ‘Reactions or Reagents’ above.

  • Type: string

Reaction Applied (rxnid): Name of the string field to identify the reaction

  • Type: string

  • Default: ReactionId

Output Mol Field (outmol): Output molecule field

  • Type: field_parameter::mol

Annotate Mol (outmolsmi): Name of the string field for the input molecule SMILES or blank to suppress

  • Type: string

  • Default: OriginalMol

Annotate Rxn (outrxnsmi): Name of the string field for the reaction SMILES or blank to suppress

  • Type: string

  • Default: Reaction

Strict Valences (strictval): If On, only output products with valid valences

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Strict Reagent Classification (strictreagents): If ON, use strict reagent classifications, otherwise relax validation to only required chemical features and suppress validations based on disallowed reagent chemistry features

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Check Valences (checkval): How to handle valence issues for the generated products

  • Required

  • Type: string

  • Default: reject

  • Choices: [‘reject’, ‘allow’, ‘fix’]

SMILES Dedupe (dedupesmi): If ON, performs a deduplication of the product smiles

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

SMILES Dedupe Memory (dedupesmimem): Product deduplication may require significant memory resources, specify the desired amount in Mb

  • Type: decimal

  • Default: 10240

Fragmentation Size (fragpercent): For the retro reaction products, require generated fragments to be at least this percentage of the heavy atom count of the input

  • Type: integer

  • Default: 40

Focused Library Filtering Options

Filter Products (mol_filtering): Enable molecule filtering of the generated products (see type specified by [Product Filter])

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Product Filter (mol_filter_type): Type of molecule filter to apply to the generated analogs

  • Type: string

  • Default: BlockBuster

  • Choices: [‘Lead’, ‘Drug’, ‘BlockBuster’, ‘BlockBuster+PAINS’, ‘PAINS’, ‘Custom’]

Product Filter Summary Report (mol_filter_summary): if ON, will generate a summary report of the rules that filtered molecules

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Focused Library Property Generation

Compute Molecule Properties (mol_props): Which molecule properties to calculate

  • Type: string

  • Default: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]

  • Choices: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]

Reagent Functional Group Conversions

Allow Functional Group Conversions (funcgroups): If ON, allows functional group translations during input molecule classifications and reagent retrievals

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Advanced Focused Library Options

Lead Molecule Minimum Records (rec_min): The minimum number of lead molecule records allowed (default:1) Input lead molecule datasets that do not meet this threshold will terminate the floe, use 0 to suppress validation

  • Type: integer

  • Default: 1

Lead Molecule Maximum Records (rec_max): The maximum number of lead molecule records allowed (default:1) Input lead molecule datasets that exceed this threshold will terminate the floe, use 0 to suppress validation

  • Type: integer

  • Default: 1

Maximum Reagents (maxreagents): Maximum number of reagents to process

  • Type: integer

  • Default: 100

Sample Reagents (samplereagents): Sample this percentage of the total reagent available, limit by the (optional) [Maximum Reagents] total

  • Type: integer

Molecule ID Field (molid): Name of the string field for the molecule id

  • Type: field_parameter::string

Molecule SMILES Field (molsmi): Name of the string field for the molecule SMILES

  • Type: field_parameter::string

Classifier Memory Limit (classifiermem): The memory limit for the reaction classifier - may need to be increased for large R&R databases

  • Required

  • Type: decimal

  • Default: 10240

Verbosity (verbosity): Sets the output logging verbosity

  • Type: string

  • Default: warning

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]