Enumerating Fragment Combinations

Problem

You want to enumerate all the adjacent fragment combinations returned by the fragmentation methods of OEMedChem TK.

../_images/enumfrags.png

Figure 1: Example molecule with fragment highlighting

For example in Figure 1 you want to return fragment combinations: A+B, B+C, ..., A+B+C, ..., A+B+C+D, ... etc. But you do not want disconnected fragment combinations such as D.A+B. See examples in Table 1.

Table 1. Example of fragment combinations
valid - all fragments adjacent invalid - disconnected fragments
../_images/enumfrags-valid.png ../_images/enumfrags-invalid.png

Ingredients

Difficulty Level

../_images/chilly.png ../_images/chilly.png

Solution

The OEMedChem TK currently provides three ways to partition a molecule into fragments (see examples in Table 2):

Table 2. Example of the fragmentation methods of OEMedChem TK
OEGetFuncGroupFragments OEGetRingChainFragments OEGetRingLinkerSideChainFragments

The GetFragmentCombinations function takes a molecule and the list of OEAtomBondSet objects. This list is generated by using one of the fragmentation methods shown in Table 2. Each OEAtomBondSet object stores the atoms and the bonds of a fragment generated for a given molecule. These parameters are passed to the GetFragmentAtomBondSetCombinations function that returns the list of adjacent fragment combinations. Each fragment combination is then passed to the OESubsetMol function using an OEIsAtomMember and an OEIsBondMember predicate. The subset molecule generated by the OESubsetMol function is then appended to a returned molecule list.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def GetFragmentCombinations(mol, fraglist):

    fragments = []
    fragcombs = GetFragmentAtomBondSetCombinations(mol, fraglist)

    for f in fragcombs:

        fragatompred = oechem.OEIsAtomMember(f.GetAtoms())
        fragbondpred = oechem.OEIsBondMember(f.GetBonds())

        fragment = oechem.OEGraphMol()
        adjustHCount = True
        oechem.OESubsetMol(fragment, mol, fragatompred, fragbondpred, adjustHCount)
        fragments.append(fragment)

    return fragments

The GetFragmentAtomBondSetCombinations function below generates all fragment combinations in the range of [2 - (nrfrags-1)] using the itertools.combinations() function. For example, if there are five fragments generated for a molecule, then the following number of fragment combinations are enumerated:

\(\dbinom{5}{2} + \dbinom{5}{3} + \dbinom{5}{4} = \frac{5!}{2!3!} + \frac{5!}{3!2!} + \frac{5!}{4!1!} = 10 + 10 + 5 = 25\)

However, only those fragment combinations that are adjacent will be kept i.e. no disconnected molecule fragment is generated (see Table 1). If the fragments in a given combination are all adjacent i.e. the IsAdjacentAtomBondSetCombination function returns true, then the atoms and the bonds of these fragments are combined by calling the CombineAndConnectAtomBondSets function and the combined fragment is added to the list returned by the GetFragmentAtomBondSetCombinations function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def GetFragmentAtomBondSetCombinations(mol, fraglist):

    fragcombs = []

    nrfrags = len(fraglist)
    for n in range(2, nrfrags):

        for fragcomb in combinations(fraglist, n):

            if IsAdjacentAtomBondSetCombination(fragcomb):

                frag = CombineAndConnectAtomBondSets(fragcomb)
                fragcombs.append(frag)

    return fragcombs

A fragment combination is considered adjacent only if all of the fragments are connected to each other in the combination. In order to determine this the IsAdjacentAtomBondSetCombination function performs a depth-first search algorithm using the TraverseFragments function.

The nrparts variable keeps track of the number of connected components found in the fragment list. If the fragments are all connected than nrparts will be one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def IsAdjacentAtomBondSetCombination(fraglist):

    parts = [0] * len(fraglist)
    nrparts = 0

    for idx, frag in enumerate(fraglist):
        if parts[idx] != 0:
            continue

        nrparts += 1
        parts[idx] = nrparts
        TraverseFragments(frag, fraglist, parts, nrparts)

    return (nrparts == 1)

The TraverseFragments is a recursive function that visits fragments that are connected to each other. The parts list is used not only to keep track of whether a fragment has been visited before, but also to which connected fragment component it belongs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def TraverseFragments(actfrag, fraglist, parts, nrparts):

    for idx, frag in enumerate(fraglist):
        if parts[idx] != 0:
            continue

        if not IsAdjacentAtomBondSets(actfrag, frag):
            continue

        parts[idx] = nrparts
        TraverseFragments(frag, fraglist, parts, nrparts)

The IsAdjacentAtomBondSets function is used to determine whether two fragments are connected. Two fragments are considered adjacent if an atom from each is connected by a bond.

1
2
3
4
5
6
7
def IsAdjacentAtomBondSets(fragA, fragB):

    for atomA in fragA.GetAtoms():
        for atomB in fragB.GetAtoms():
            if atomA.GetBond(atomB) is not None:
                return True
    return False

If the IsAdjacentAtomBondSetCombination function returns true i.e the fragments are all connected, then the CombineAndConnectAtomBondSets function can be used to combine the adjacent fragments together by merging their atoms and bonds and adding those bonds that connect them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def CombineAndConnectAtomBondSets(fraglist):

    # combine atom and bond sets

    combined = oechem.OEAtomBondSet()
    for frag in fraglist:
        for atom in frag.GetAtoms():
            combined.AddAtom(atom)
        for bond in frag.GetBonds():
            combined.AddBond(bond)

    # add connecting bonds

    for atomA in combined.GetAtoms():
        for atomB in combined.GetAtoms():
            if atomA.GetIdx() < atomB.GetIdx():
                continue

            bond = atomA.GetBond(atomB)
            if bond is None:
                continue
            if combined.HasBond(bond):
                continue

            combined.AddBond(bond)

    return combined

Download code

enumfrags.py

The following example shows the output fragment combinations of the molecule depicted in Figure 1.

Usage:

prompt > python3 enumfrags.py -in .ism -out .ism -fragtype funcgroup

CC(C)NCC(COc1ccc(c(c1)Cc2ccccc2)CC(=O)N)O

5 number of fragments generated
11 number of fragment combinations generated
c1ccc(cc1)Cc2cccc(c2)O
c1cc(ccc1CC(=O)N)O
c1ccc(cc1)OCCO
CC(C)NCC(C)O
c1ccc(cc1)Cc2cc(ccc2CC(=O)N)O
c1ccc(cc1)Cc2cccc(c2)OCCO
c1cc(ccc1CC(=O)N)OCCO
CC(C)NCC(COc1ccccc1)O
c1ccc(cc1)Cc2cc(ccc2CC(=O)N)OCCO
CC(C)NCC(COc1cccc(c1)Cc2ccccc2)O
CC(C)NCC(COc1ccc(cc1)CC(=O)N)O

See also in Python documentation

See also in OEChem TK manual

API

See also in OEMedChem TK manual

Theory

API

See also