Reagent Archive - Create from tarfile¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Role-based/Cheminformatician/Medicinal Chemistry Support
Role-based/Cheminformatician/Corporate Collection Support
Description
This floe processes an uploaded archive of reagent files for use in subsequent reaction enumerations.
General approach
Create the reagent archive on your local machine for all reagent files in a single top-level directory.
Upload the archive file to Orion and disable any processing of the file - the raw archive file is required for reagent processing.
Reagent Archive Files
Each reaction in the archive must have a unique reaction identifier, for example, rxnkey. Each of the reagent sets corresponding to the specific reaction of that key are named with a numeric ID suffix: rxnkey_1.smi, rxnkey_2.smi …
Each reagent file is processed individually to ensure it meets the minimum requirements for enumeration to products; namely, each reagent structure
in each reagent file must contain at least one attachment point site. If the reagent structures already contain the required OESMILESFlag_ExtBonds
attachment point annotations for the reagents, no additional processing of the reagents is required, and the reagents can be used as is.
In this case, no alchemy.txt
or .umr
files should be present in the archive.
If the reagent files contain specific atom types to annotate the reagent attachment points, the archive must contain an alchemy.txt
file which
indicates which atom types correspond to which numbered attachment points. If the alchemy.txt file is provided, every reagent structure will be
processed using that information.
There is additional support for reagent clipping or manipulation. If a specific reagent has a corresponding .umr
file in the archive, that reagent
is assumed to need additional reagent processing by the provided OEUniMolecularRxn
SMARTS specification. That is, for rxnkey_2.smi in the presence
of a rxnkey_2.umr file, a OEUniMolecularRxn
will be created from the .umr
contents (first line only), and each of the reagents in rxnkey_2.smi
will be modified by the specified transformation. Only reagents that have a single transformation will be considered valid reagents; if more than one
site is found for the transformation, that specific reagent structure will be skipped. Care should be taken to write the OEUniMolecularRxn
transformation
to generate invalid valence site(s) from any clipped or broken bonds for fragments that are to be removed in the transformation process. There
is no inherent support for removing general/clipped structural fragments from an OEUniMolecularRxn
transformation if bonds are broken, so identifying them
with illegal valence site(s) is a temporary workaround for this limitation. If the input reagent itself has invalid valence site(s), or if no
invalid valence site(s) are generated by the transformation, all but the largest fragment will be removed post-transformation to ensure no extra fragments are retained.
The stages of reagent archive validation are:
Collect all unique reaction keys.
Verify correct naming of reagents for each reaction key (at least 2 reagent files, at most 4 reagents, and reagent file numeric IDs in consecutive order).
Find the (optional) specification for alchemy conversions (archive-specific).
Collect any
.umr
transformations (reagent-specific).
For all reagent files that pass the reaction naming validations, loop over all reagents in each reagent file to:
verify a valid SMILES structure specification for the reagent.
(optional) apply any
.umr
transformation specification to each reagent - requiring reagents to contain only a single transformation site - remove any generated fragments from the.umr
transformation due to bond breaks.(optional) perform alchemy conversion(s) if
alchemy.txt
is present.verify that at least one attachment site is present on the processed reagent.
All the reagent files with valid names and all reagent structures therein that meet the reagent validations are exported to a reagent enumeration collection that can be enumerated to products.
Alchemy File Format
The optional alchemy.txt
file should contain a single line of the following format:
sym1:attachpt1[,sym2:attachpt2][,sym3:attachpt3]
For example, include Fe:1,Mo:2,W:3,Y:4
for iron, molybdenum, tungsten and yttrium attachment point specifications for R1-R4
.
Supported File Formats
The supported reagent structure formats are: .smi, .ism, .usm, .can
.
The supported OEUniMolecularRxn
SMARTS transformation extension is: .umr
.
The supported archive formats are: .tar, .tar.gz, .tgz, .tar.xz, .txz, .tar.bz2, .tbz2, .zip
. Generally the supported archive formats
for any particular platform are those listed from shutil.get_unpack_formats()
.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Archive File (archive): Name of the non-ETL’d Orion file containing the reagents to be validated and prepared for enumerations, and then loaded into a reagent collection.
Required
Type: file_in
Options
Sort Order (sort_order): Defines the ordering of the reactions in the reagent listing: name:reaction name, increasing/decreasing: total enumeration product counts.
Required
Type: string
Default: name
Choices: [‘name’, ‘increasing’, ‘decreasing’]
Verbosity (verbosity): Sets the output logging verbosity.
Type: string
Default: warning
Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]