Fingerprint Patterns

Fingerprints are usually generated by enumerating various fragments (patterns) of a molecule and then hashing them into a fixed-length bitvector. The OEGetFPPatterns function provides access to these patterns by returning an iterator over OEFPPattern objects, each of which has the atoms and bonds of a specific fragment along with the following information:

  • the bit that is set when the pattern is hashed into a fixed-length bitvector

  • the unhashed unique integer encoding of the pattern

  • the SMARTS representation of the pattern that encodes the atom and bond properties that are specified in the given fingerprint type (OEFPTypeBase)

The following example shows how the retrieve the patterns that are enumerated when generating a tree fingerprint.

Listing 19: Example of accessing patterns encoded into a fingerprint

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "NCC(=O)[O-]")

fptype = oegraphsim.OEGetFPType("Tree,ver=2.0.0,size=4096,bonds=0-4,"
                                "atype=AtmNum|Arom|FCharge|HvyDeg,btype=Order")

for idx, pattern in enumerate(oegraphsim.OEGetFPPatterns(mol, fptype)):
    atomstr = " ".join([str(a) for a in pattern.GetAtoms()])
    print("%2d %5d %50s %s" % ((idx + 1), pattern.GetBit(), pattern.GetSmarts(), atomstr))

The output of Listing 19 is the following:

 1   849                                             [N;D1]  0 N
 2  1611                                             [C;D2]  1 C
 3   740                                             [C;D3]  2 C
 4  2841                                             [O;D1]  3 O
 5  1589                                            [O-;D1]  4 O
 6  2326                                      [N;D1]-[C;D2]  0 N  1 C
 7  3764                             [C;D2](-[C;D3])-[N;D1]  0 N  1 C  2 C
 8  2727                      [C;D2](-[C;D3]=[O;D1])-[N;D1]  0 N  1 C  2 C  3 O
 9   859            [C;D2](-[C;D3](-[O-;D1])=[O;D1])-[N;D1]  0 N  1 C  2 C  3 O  4 O
10  3537                     [C;D2](-[C;D3]-[O-;D1])-[N;D1]  0 N  1 C  2 C  4 O
11  3083                                      [C;D3]-[C;D2]  1 C  2 C
12   793                               [C;D2]-[C;D3]=[O;D1]  1 C  2 C  3 O
13  1574                     [C;D2]-[C;D3](-[O-;D1])=[O;D1]  1 C  2 C  3 O  4 O
14  3538                              [C;D2]-[C;D3]-[O-;D1]  1 C  2 C  4 O
15  1707                                      [O;D1]=[C;D3]  2 C  3 O
16  1472                            [C;D3](-[O-;D1])=[O;D1]  2 C  3 O  4 O
17  1397                                     [O-;D1]-[C;D3]  2 C  4 O

See also