Reagent Archive - Create from tarfile

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Role-based/Cheminformatician/Medicinal Chemistry Support

  • Role-based/Cheminformatician/Corporate Collection Support

Description

This floe processes an uploaded archive of reagent files for use in subsequent reaction enumerations.

General approach

  • Create the reagent archive on your local machine for all reagent files in a single top-level directory.

  • Upload the archive file to Orion and disable any processing of the file - the raw archive file is required for reagent processing.

Reagent Archive Files

Each reaction in the archive must have a unique reaction identifier, for example, rxnkey. Each of the reagent sets corresponding to the specific reaction of that key are named with a numeric ID suffix: rxnkey_1.smi, rxnkey_2.smi

Each reagent file is processed individually to ensure it meets the minimum requirements for enumeration to products; namely, each reagent structure in each reagent file must contain at least one attachment point site. If the reagent structures already contain the required OESMILESFlag_ExtBonds attachment point annotations for the reagents, no additional processing of the reagents is required, and the reagents can be used as is. In this case, no alchemy.txt or .umr files should be present in the archive.

If the reagent files contain specific atom types to annotate the reagent attachment points, the archive must contain an alchemy.txt file which indicates which atom types correspond to which numbered attachment points. If the alchemy.txt file is provided, every reagent structure will be processed using that information.

There is additional support for reagent clipping or manipulation. If a specific reagent has a corresponding .umr file in the archive, that reagent is assumed to need additional reagent processing by the provided OEUniMolecularRxn SMARTS specification. That is, for rxnkey_2.smi in the presence of a rxnkey_2.umr file, a OEUniMolecularRxn will be created from the .umr contents (first line only), and each of the reagents in rxnkey_2.smi will be modified by the specified transformation. Only reagents that have a single transformation will be considered valid reagents; if more than one site is found for the transformation, that specific reagent structure will be skipped. Care should be taken to write the OEUniMolecularRxn transformation to generate invalid valence site(s) from any clipped or broken bonds for fragments that are to be removed in the transformation process. There is no inherent support for removing general/clipped structural fragments from an OEUniMolecularRxn transformation if bonds are broken, so identifying them with illegal valence site(s) is a temporary workaround for this limitation. If the input reagent itself has invalid valence site(s), or if no invalid valence site(s) are generated by the transformation, all but the largest fragment will be removed post-transformation to ensure no extra fragments are retained.

The stages of reagent archive validation are:

  • Collect all unique reaction keys.

  • Verify correct naming of reagents for each reaction key (at least 2 reagent files, at most 4 reagents, and reagent file numeric IDs in consecutive order).

  • Find the (optional) specification for alchemy conversions (archive-specific).

  • Collect any .umr transformations (reagent-specific).

For all reagent files that pass the reaction naming validations, loop over all reagents in each reagent file to:

  • verify a valid SMILES structure specification for the reagent.

  • (optional) apply any .umr transformation specification to each reagent - requiring reagents to contain only a single transformation site - remove any generated fragments from the .umr transformation due to bond breaks.

  • (optional) perform alchemy conversion(s) if alchemy.txt is present.

  • verify that at least one attachment site is present on the processed reagent.

All the reagent files with valid names and all reagent structures therein that meet the reagent validations are exported to a reagent enumeration collection that can be enumerated to products.

Alchemy File Format

The optional alchemy.txt file should contain a single line of the following format:

  • sym1:attachpt1[,sym2:attachpt2][,sym3:attachpt3]

For example, include Fe:1,Mo:2,W:3,Y:4 for iron, molybdenum, tungsten and yttrium attachment point specifications for R1-R4.

Supported File Formats

The supported reagent structure formats are: .smi, .ism, .usm, .can.

The supported OEUniMolecularRxn SMARTS transformation extension is: .umr.

The supported archive formats are: .tar, .tar.gz, .tgz, .tar.xz, .txz, .tar.bz2, .tbz2, .zip. Generally the supported archive formats for any particular platform are those listed from shutil.get_unpack_formats().

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Archive File (archive): Name of the non-ETL’d Orion file containing the reagents to be validated and prepared for enumerations, and then loaded into a reagent collection.

  • Required

  • Type: file_in

Options

Sort Order (sort_order): Defines the ordering of the reactions in the reagent listing: name:reaction name, increasing/decreasing: total enumeration product counts.

  • Required

  • Type: string

  • Default: name

  • Choices: [‘name’, ‘increasing’, ‘decreasing’]

Verbosity (verbosity): Sets the output logging verbosity.

  • Type: string

  • Default: warning

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]