# Enumerating Fragment Combinations¶

## Problem¶

You want to enumerate all the adjacent fragment combinations returned by the fragmentation methods of OEMedChem TK.

For example in Figure 1 you want to return fragment combinations: A+B, B+C, …, A+B+C, …, A+B+C+D, … etc. But you do not want disconnected fragment combinations such as D.A+B. See examples in Table 1.

 valid - all fragments adjacent invalid - disconnected fragments

## Ingredients¶

 OEChem TK - cheminformatics toolkit OEMedChem TK - cheminformatics toolkit

## Solution¶

The OEMedChem TK currently provides three ways to partition a molecule into fragments (see examples in Table 2):

The GetFragmentCombinations function takes a molecule and the list of OEAtomBondSet objects. This list is generated by using one of the fragmentation methods shown in Table 2. Each OEAtomBondSet object stores the atoms and the bonds of a fragment generated for a given molecule. These parameters are passed to the GetFragmentAtomBondSetCombinations function that returns the list of adjacent fragment combinations. Each fragment combination is then passed to the OESubsetMol function using an OEIsAtomMember and an OEIsBondMember predicate. The subset molecule generated by the OESubsetMol function is then appended to a returned molecule list.

 1def GetFragmentCombinations(mol, fraglist):
2
3    fragments = []
4    fragcombs = GetFragmentAtomBondSetCombinations(mol, fraglist)
5
6    for f in fragcombs:
7
8        fragatompred = oechem.OEIsAtomMember(f.GetAtoms())
9        fragbondpred = oechem.OEIsBondMember(f.GetBonds())
10
11        fragment = oechem.OEGraphMol()
13        oechem.OESubsetMol(fragment, mol, fragatompred, fragbondpred, adjustHCount)
14        fragments.append(fragment)
15
16    return fragments


The GetFragmentAtomBondSetCombinations function below generates all fragment combinations in the range of [2 - (nrfrags-1)] using the itertools.combinations() function. For example, if there are five fragments generated for a molecule, then the following number of fragment combinations are enumerated:

$$\dbinom{5}{2} + \dbinom{5}{3} + \dbinom{5}{4} = \frac{5!}{2!3!} + \frac{5!}{3!2!} + \frac{5!}{4!1!} = 10 + 10 + 5 = 25$$

However, only those fragment combinations that are adjacent will be kept i.e. no disconnected molecule fragment is generated (see Table 1). If the fragments in a given combination are all adjacent i.e. the IsAdjacentAtomBondSetCombination function returns true, then the atoms and the bonds of these fragments are combined by calling the CombineAndConnectAtomBondSets function and the combined fragment is added to the list returned by the GetFragmentAtomBondSetCombinations function.

 1def GetFragmentAtomBondSetCombinations(mol, fraglist):
2
3    fragcombs = []
4
5    nrfrags = len(fraglist)
6    for n in range(2, nrfrags):
7
8        for fragcomb in combinations(fraglist, n):
9
11
12                frag = CombineAndConnectAtomBondSets(fragcomb)
13                fragcombs.append(frag)
14
15    return fragcombs


A fragment combination is considered adjacent only if all of the fragments are connected to each other in the combination. In order to determine this the IsAdjacentAtomBondSetCombination function performs a depth-first search algorithm using the TraverseFragments function.

The nrparts variable keeps track of the number of connected components found in the fragment list. If the fragments are all connected than nrparts will be one.

 1def IsAdjacentAtomBondSetCombination(fraglist):
2
3    parts = [0] * len(fraglist)
4    nrparts = 0
5
6    for idx, frag in enumerate(fraglist):
7        if parts[idx] != 0:
8            continue
9
10        nrparts += 1
11        parts[idx] = nrparts
12        TraverseFragments(frag, fraglist, parts, nrparts)
13
14    return (nrparts == 1)


The TraverseFragments is a recursive function that visits fragments that are connected to each other. The parts list is used not only to keep track of whether a fragment has been visited before, but also to which connected fragment component it belongs.

 1def TraverseFragments(actfrag, fraglist, parts, nrparts):
2
3    for idx, frag in enumerate(fraglist):
4        if parts[idx] != 0:
5            continue
6
8            continue
9
10        parts[idx] = nrparts
11        TraverseFragments(frag, fraglist, parts, nrparts)


The IsAdjacentAtomBondSets function is used to determine whether two fragments are connected. Two fragments are considered adjacent if an atom from each is connected by a bond.

1def IsAdjacentAtomBondSets(fragA, fragB):
2
3    for atomA in fragA.GetAtoms():
4        for atomB in fragB.GetAtoms():
5            if atomA.GetBond(atomB) is not None:
6                return True
7    return False


If the IsAdjacentAtomBondSetCombination function returns true i.e the fragments are all connected, then the CombineAndConnectAtomBondSets function can be used to combine the adjacent fragments together by merging their atoms and bonds and adding those bonds that connect them.

 1def CombineAndConnectAtomBondSets(fraglist):
2
3    # combine atom and bond sets
4
5    combined = oechem.OEAtomBondSet()
6    for frag in fraglist:
7        for atom in frag.GetAtoms():
9        for bond in frag.GetBonds():
11
13
14    for atomA in combined.GetAtoms():
15        for atomB in combined.GetAtoms():
16            if atomA.GetIdx() < atomB.GetIdx():
17                continue
18
19            bond = atomA.GetBond(atomB)
20            if bond is None:
21                continue
22            if combined.HasBond(bond):
23                continue
24
26
27    return combined


The following example shows the output fragment combinations of the molecule depicted in Figure 1.

Usage:

prompt > python3 enumfrags.py -in .ism -out .ism -fragtype funcgroup

CC(C)NCC(COc1ccc(c(c1)Cc2ccccc2)CC(=O)N)O

5 number of fragments generated
11 number of fragment combinations generated
c1ccc(cc1)Cc2cccc(c2)O
c1cc(ccc1CC(=O)N)O
c1ccc(cc1)OCCO
CC(C)NCC(C)O
c1ccc(cc1)Cc2cc(ccc2CC(=O)N)O
c1ccc(cc1)Cc2cccc(c2)OCCO
c1cc(ccc1CC(=O)N)OCCO
CC(C)NCC(COc1ccccc1)O
c1ccc(cc1)Cc2cc(ccc2CC(=O)N)OCCO
CC(C)NCC(COc1cccc(c1)Cc2ccccc2)O
CC(C)NCC(COc1ccc(cc1)CC(=O)N)O