OEChem Examples Summary

Molecule processing

Convert molecule files

A program that converts molecules from one format to another based on the file extension. An example command would be:

Convert input.mol2 output.oeb.gz

See also

Concatenating molecules

A program that takes an input molecule file (or files) and concatenates all the molecules into a single molecule file. An example command would be:

CatMols -i file1.oeb.gz file2.oeb.gz -o output.oeb.gz

See also

Splitting multicomponent molecules

A program that splits multicomponent molecules into their constituent parts. The program will output the molecules to stdout as SMILES if no output file is set. The program will print the number of input and output molecules to stderr. An example command would be:

Parts2Mols -i file1.oeb.gz file2.oeb.gz -o output.oeb.gz

Extract molecules by title

A program that extracts molecules from a file based on their title. The program can take as input either a specific title or a file containing a list of titles. Example commands could be:

MolExtract -title Mol0001 -i dbase.oeb.gz -o Mol0001.oeb.gz
MolExtract -list names.txt dbase.oeb.gz output.oeb.gz

See also

Write out unique molecules

A program that loads a database of molecules and outputs those that are unique. Uniqueness is defined by whether they have the same canonical isomeric SMILES, or standard InChI. An example command would be:

UniqMol dbase.oeb.gz output.oeb.gz

UniqInChI dbase.oeb.gz output.oeb.gz

See also

Randomize atoms of molecules

A program that randomizes the order of the atoms within the molecule file. The structure of the molecule is not changed. An example command would be:

RandomizeAtoms dbase.oeb.gz output.oeb.gz

See also

Generate canonical smiles

A program that generates the canonical SMILES of the molecules in the input file. It has a number of options such as -from3d, which perceives stereo from the 3D coordinates, -isomeric, which produces the canonical isomeric SMILES, and -kekule, which produces the Kekulé SMILES form. Example commands could be:

CanSmi dbase.oeb.gz output.smi
CanSmi -from3d true -i dbase.sdf -o output.smi

Filter molecules by weight or heavy atom count

A program that filters molecules by their weight and/or heavy atom count. The program has four flags: -minhac, the minimum heavy atom count, -maxhac, the maximum heavy atom count, -minwt, the minimum molecular weight and -maxwt the maximum molecular weight. Any combination of the flags can be set. The program will output the molecules to stdout as SMILES if no output file is set. Example commands could be:

SizeFilter -minhac 10 dbase.oeb.gz
SizeFilter -minhac 10 -maxwt 300 -i dbase.oeb.gz -o output.oeb.gz

See also

Strip salts

A program that removes all but the largest molecule from molecules that contain more than one part. If the molecule contains two or more parts that are equal in size it keeps the larger. An example command would be:

StripSalts dbase.oeb.gz output.oeb.gz

Extract rings

A program that removes all non-ring atoms from the input molecules. By default double bonded atoms exo to a ring are included as ring atoms. This can be changed by using the flag -exo. The program will output the molecules to stdout as SMILES if no output file is set. The molecule name will be appended with “_rings”. Example commands could be:

RingSubset dbase.oeb.gz
RingSubset -exo false -i dbase.oeb.gz -o output.oeb.gz

Extract molecule scaffolds

A program that removes all atoms that are not ring atoms nor atoms in linkers between rings. By default double bonded atoms exo to a ring are included as ring atoms. This can be changed by using the flag -exo. Example commands could be:

ExtractScaffold dbase.oeb.gz
ExtractScaffold -exo false -i dbase.oeb.gz -o output.oeb.gz

Extract random molecule subset

A program that extracts a randomized subset of molecules from a molecule file. The program has two flags, -p, extract a percentage of the database, or -n, extract a specific number from the database. If neither flag is set then the whole database will be randomized. Example commands could be:

RandomSample -n 1000 dbase.oeb.gz output.oeb.gz
RandomSample -p 10 -i dbase.oeb.gz -o output.oeb.gz

See also

Reactions

Performing a reaction

A program that uses a SMIRKS to perform reactions on single molecules. The OpenEye OEUniMolecularRxn class is designed to react every instance of the reactant pattern in the input molecules and so is useful for normalization reactions. An example command would be:

UniMolRxn '[NH2:1]>>[Nh3+:1]' dbase.oeb.gz output.oeb.gz

For reactions involving multiple reactants, or for multiple products, see libgen below.

See also

Library generation

A program that uses a SMIRKS string or an MDL reaction file to perform reactions on input molecules. The program has a number of flags. OELibraryGen options are: -implicitH, performs the reaction with implicit hydrogens (default false), -relax, ensures unmapped atoms on the reactant side are not deleted during reaction (default false), -valence, applies automatic valence correction (default false). SMILES generation options are: -isomeric, includes atom and bond stereochemistry in the output (default false), -unique, only include unique product canonical SMILES (default false). Input and output options are: -reactants, a molecule file, or files, of reactants, -smirks, a SMIRKS string of the reaction or -rxn, an MDL reaction file of the reaction, and optionally, -product, the output molecule file. The program will output the molecules to stdout as SMILES if no output file is set. Example commands could be:

LibGen -smirks '[C:1]c1c([N:2])cccc1>>[C:1]c1cc([N:2])ccc1' -reactants input.smi
LibGen -rxn reaction.rxn -reactants input.smi -product output.smi -valence -isomeric

See also

Molecule searching

Perform substructure searches

A program that searches a molecule file using a SMARTS pattern. Flags include: -c, count the number of matches, -o, output the matches to a file, -p, the SMARTS pattern to use, -r, do the reverse search i.e. all molecules that don’t match the SMARTS. Example commands could be:

MolGrep -p 'c1ccccc1' -c dbase.oeb.gz
MolGrep -r -p 'c1ccccc1' -i dbase.oeb.gz -o output.oeb.gz

See also

Molecule alignment

Align molecules by maximum common substructure

A program that aligns a database of molecules with a reference molecule based on the maximum common substructure (MCS) between the reference and each query. The output file consists of the input reference molecule and then each aligned database molecule. An example command would be:

MCS3DAlign ligand.pdb dbase.oeb.gz output.oeb.gz

See also

Align molecules by clique match

A program that aligns a database of molecules with a reference molecule based on the clique of matches between the reference and each query. Clique detection is the process of finding all possible correspondences between two graphs within a set of bounds, the upper bound being the MCS, and in the this example, the lower bound is up to five atoms different to the MCS. An example command would be:

CliqueAlign ligand.pdb dbase.oeb.gz output.oeb.gz

See also

Align molecules by SMARTS match

A program that aligns a database of molecules with a reference molecule based on the SMARTS matches between the reference and each query. An example command would be:

SMARTSAlign ligand.pdb dbase.oeb.gz output.oeb.gz 'a1aaaa1NC'

See also

Align multi-conformer molecules

A program that performs RMSD calculation between a 3D reference molecule and multi-conformation molecules.

See also

SDF specific

Modifying SD tags

A program that can modify the SD tags on an SD molecule. The flags are: -remove, a list of property tags to be removed (non-matching tags are kept), -keep`, a list of property tags to be kept (non-matching tags are removed), -clearAll, all SD tags are removed. The program formats are limited to those that can have SD data i.e. SDF, OEB, and CSV. Example commands could be:

SDFModProps -keep NAME ID_NUM -i dbase.sdf -o output.sdf
SDFModProps -clearAll -i dbase.sdf -o output.oeb.gz

See also

Exporting SD data to a csv file

A program that converts the tags on an SD file to csv format. The output has on the first line the molecule title, followed by all the unique tags found in the input database. An example command would be:

SDF2CSV input.sdf output.csv

See also

Adding csv data as SD tags

A program that adds the data in a csv file as SD tags to a database of molecules. The csv data and molecules are paired by the molecule title. An example command would be:

CSV2SDF input.csv output.sdf

See also

Renaming molecules by SD field

A program that renames a molecule by a particular SD tag attached to that molecule. An example command would be:

SDFRename ID_NUM input.sdf output.smi

See also

Filter molecules by SD data

A program that can filter a database of molecules based on SD tags. Any SD tag that contains numerical data can be used. Flags are: -tag, the SD tag to use, -min, the minimum value of the tag, -max, the maximum value of the tag. The program formats are limited to those that can have SD data i.e. SDF, OEB, and CSV. An example command would be:

SDFilter -tag LOGD -min -2 -max 5 -i input.sdf -o output.oeb.gz

See also

Molecule information

Counting molecules

A program that counts the number of molecules, or the number of conformers, in an input molecule file, or files. The flag -conf is used to count conformers. The output is the total molecules or conformers per file, and the total sum if more than one file is used as input. If conformers are counted the average conformer count per molecule is also output. An example command would be:

MolCount -conf input1.oeb.gz input2.oeb.gz

Get molecule titles

A program that prints the molecule titles found in an input molecule database. The program will output the titles to stdout if no output file is set. Molecules with no title are shown as “untitled”. An example command would be:

GetTitles input.oeb.gz

See also

Find minimum path in a molecule

A program that finds and outputs the minimum path in a molecule. The input can be two specific named atoms or two SMARTS matches. As SMARTS matches could hit more than one atom this could result in multiple different paths. To use atom names a helper program, printatomnames, is provided (see below). If -o is set then the atoms in the path will be output to a molecule file, otherwise just the length of the path is reported. -verbose outputs more details on the path. Example commands could be:

MinPath -i input.oeb.gz -atom1 'N1' -atom2 'S1'
MinPath -i input.oeb.gz -smarts1 'c1cccs1' -smarts2 'OC(=O)C' -o output.smi

See also

2D coordinate generation utilities

Extract ring templates

A program that extracts anonymized ring systems that can be potential new ring templates in the OEChem TK 2D coordinate generation system.

Create 2D ring dictionary

A program that creates a dictionary of 2D ring layouts that can be plugged into the 2D coordinates generation system.

Append to 2D ring dictionary

A program that adds new ring layouts to a ring dictionary that can be plugged into the 2D coordinates generation system.

Note

If a ring template already exists in the dictionary then the new template will be ignored i.e. currently ring templates can not be overwritten in the OE2DRingDictionary object.

Generate 2D coordinates with user-defined ring templates

A program that generates 2D coordinates with user-defined ring layouts.