Enumerating Atom Substitutions¶

Problem¶

You want to enumerate all possible carbon to nitrogen atom substitutions in a molecule. See examples in Table 1

 one substitution two substitutions three substitutions four substitutions

Ingredients¶

 OEChem TK - cheminformatics toolkit

Solution¶

The EnumerateSubstitutions function below generates all possible combinations of the given atom indices in the range of [minsubs - maxsubs].

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def EnumerateSubstitutions(mol, minsubs, maxsubs, atompred=oechem.OEIsCarbon()): oechem.OESuppressHydrogens(mol) mol.Sweep() atomindices = [atom.GetIdx() for atom in mol.GetAtoms(atompred)] submols = [] for n in range(minsubs, maxsubs + 1): for atomcomb in combinations(atomindices, n): smol = oechem.OEGraphMol(mol) if PerformSubstitution(smol, atomcomb): submols.append(oechem.OEGraphMol(smol)) return submols 

The PerformSubstitution function carries out the carbon atom to nitrogen atom substitutions defined by the given atom combination. After setting the atomic number, the implicit hydrogen count and the charge of the altered atoms are corrected in order to generate chemically correct molecules.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 def PerformSubstitution(mol, atomcomb): for aidx in atomcomb: atom = mol.GetAtom(oechem.OEHasAtomIdx(aidx)) if atom is None: return False atom.SetAtomicNum(oechem.OEElemNo_N) # correct hydrogen count and charge valence = 3 hcount, charge = 0, 0 if valence > atom.GetExplicitValence(): hcount = valence - atom.GetExplicitValence() charge = 0 elif valence < atom.GetExplicitValence(): hcount = 0 charge = atom.GetExplicitValence() - valence atom.SetImplicitHCount(hcount) atom.SetFormalCharge(charge) return True 

After enumerating all possible atom substitutions, the molecules are written to the output stream by calling the WriteUniqueMolecules function. In order to avoid outputting duplicate molecules, the OEMolToSmiles function is used to generate canonical isomeric SMILES. The OEMolToSmiles function generates the same SMILES string for isomorphic molecules i.e. molecules that only differ by atom ordering. See example of graph isomorphic molecules in Table 2.

 1 2 3 4 5 6 7 def WriteUniqueMolecules(mols, ofs): uniqsmiles = set() for mol in mols: smi = oechem.OEMolToSmiles(mol) if smi not in uniqsmiles: uniqsmiles.add(smi) oechem.OEWriteMolecule(ofs, mol) 

Usage:

prompt > python3 enumsubstitutions.py -in .ism -out .ism -minsubs 1 -maxsubs 2

C1CC(=O)CC1


Running the above commands will generate the following output:

C1CNCC1=O
C1CC(=O)NC1
C1CC[N+](=O)C1
C1CNNC1=O
C1C[N+](=O)CN1
C1C(=O)NCN1
C1C(=O)CNN1
C1CN[N+](=O)C1
C1CNC(=O)N1


See the first two columns of the Table 1 that depict the one and two atom substitutions of the input molecule C1CC(=O)CC1.

Discussion¶

The EnumerateSubstitutions function takes an atom predicate that determines which atoms are being substituted. By default, it only allows substitution of carbon atoms. OEChem TK has a diverse selection of built-in atom predicates and also allows the creation of user-defined ones. This provide a flexible and convenient way to determine the set of atoms to be substituted. For example when using the OEIsHeavy predicate, the enumeration of any two substitutions will result in the following extra three molecules for the cyclopentanone input structure: