Enumerating Atom Substitutions

Problem

You want to enumerate all possible carbon to nitrogen atom substitutions in a molecule. See examples in Table 1

Table 1. Example of enumerating atom substitutions

one substitution

two substitutions

three substitutions

four substitutions

../_images/enumsubstitutions-one.png ../_images/enumsubstitutions-two.png ../_images/enumsubstitutions-three.png ../_images/enumsubstitutions-four.png

Ingredients

Difficulty Level

../_images/chilly.png

Solution

The EnumerateSubstitutions function below generates all possible combinations of the given atom indices in the range of [minsubs - maxsubs].

 1def EnumerateSubstitutions(mol, minsubs, maxsubs, atompred=oechem.OEIsCarbon()):
 2
 3    oechem.OESuppressHydrogens(mol)
 4    mol.Sweep()
 5    atomindices = [atom.GetIdx() for atom in mol.GetAtoms(atompred)]
 6
 7    submols = []
 8
 9    for n in range(minsubs, maxsubs + 1):
10        for atomcomb in combinations(atomindices, n):
11            smol = oechem.OEGraphMol(mol)
12            if PerformSubstitution(smol, atomcomb):
13                submols.append(oechem.OEGraphMol(smol))
14
15    return submols

The PerformSubstitution function carries out the carbon atom to nitrogen atom substitutions defined by the given atom combination. After setting the atomic number, the implicit hydrogen count and the charge of the altered atoms are corrected in order to generate chemically correct molecules.

 1def PerformSubstitution(mol, atomcomb):
 2
 3    for aidx in atomcomb:
 4        atom = mol.GetAtom(oechem.OEHasAtomIdx(aidx))
 5        if atom is None:
 6            return False
 7        atom.SetAtomicNum(oechem.OEElemNo_N)
 8
 9        # correct hydrogen count and charge
10        valence = 3
11        hcount, charge = 0, 0
12
13        if valence > atom.GetExplicitValence():
14            hcount = valence - atom.GetExplicitValence()
15            charge = 0
16        elif valence < atom.GetExplicitValence():
17            hcount = 0
18            charge = atom.GetExplicitValence() - valence
19
20        atom.SetImplicitHCount(hcount)
21        atom.SetFormalCharge(charge)
22
23    return True

After enumerating all possible atom substitutions, the molecules are written to the output stream by calling the WriteUniqueMolecules function. In order to avoid outputting duplicate molecules, the OEMolToSmiles function is used to generate canonical isomeric SMILES. The OEMolToSmiles function generates the same SMILES string for isomorphic molecules i.e. molecules that only differ by atom ordering. See example of graph isomorphic molecules in Table 2.

1def WriteUniqueMolecules(mols, ofs):
2    uniqsmiles = set()
3    for mol in mols:
4        smi = oechem.OEMolToSmiles(mol)
5        if smi not in uniqsmiles:
6            uniqsmiles.add(smi)
7            oechem.OEWriteMolecule(ofs, mol)
Table 2. Example of graph isomorphic molecules.
../_images/substitution-isomorph-A.png ../_images/substitution-isomorph-B.png

Download code

enumsubstitutions.py

Usage:

prompt > python3 enumsubstitutions.py -in .ism -out .ism -minsubs 1 -maxsubs 2

C1CC(=O)CC1

Running the above commands will generate the following output:

C1CNCC1=O
C1CC(=O)NC1
C1CC[N+](=O)C1
C1CNNC1=O
C1C[N+](=O)CN1
C1C(=O)NCN1
C1C(=O)CNN1
C1CN[N+](=O)C1
C1CNC(=O)N1

See the first two columns of the Table 1 that depict the one and two atom substitutions of the input molecule C1CC(=O)CC1.

Discussion

The EnumerateSubstitutions function takes an atom predicate that determines which atoms are being substituted. By default, it only allows substitution of carbon atoms. OEChem TK has a diverse selection of built-in atom predicates and also allows the creation of user-defined ones. This provide a flexible and convenient way to determine the set of atoms to be substituted. For example when using the OEIsHeavy predicate, the enumeration of any two substitutions will result in the following extra three molecules for the cyclopentanone input structure:

Table 3. Example of `extra` substitutions when allowing substitution of any heavy atom.
../_images/enumsubstitutions-extra-1.png ../_images/enumsubstitutions-extra-2.png ../_images/enumsubstitutions-extra-3.png

See also in Python documentation

  • itertools.combinations()

See also in OEChem TK manual

Theory

API