Fingerprint Patterns¶
Fingerprints are usually generated by enumerating various fragments (patterns)
of a molecule and then hashing them into a fixed-length bitvector.
The OEGetFPPatterns
function provides access to these
patterns by returning an iterator over OEFPPattern objects,
each of which has the atoms and bonds of a specific fragment along with the
following information:
the bit that is set when the pattern is hashed into a fixed-length bitvector
the unhashed unique integer encoding of the pattern
the SMARTS representation of the pattern that encodes the atom and bond properties that are specified in the given fingerprint type (OEFPTypeBase)
The following example shows how the retrieve the patterns that are enumerated when generating a tree fingerprint.
Listing 19: Example of accessing patterns encoded into a fingerprint
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "NCC(=O)[O-]")
fptype = oegraphsim.OEGetFPType("Tree,ver=2.0.0,size=4096,bonds=0-4,"
"atype=AtmNum|Arom|FCharge|HvyDeg,btype=Order")
for idx, pattern in enumerate(oegraphsim.OEGetFPPatterns(mol, fptype)):
atomstr = " ".join([str(a) for a in pattern.GetAtoms()])
print("%2d %5d %50s %s" % ((idx + 1), pattern.GetBit(), pattern.GetSmarts(), atomstr))
The output of Listing 19
is the following:
1 849 [N;D1] 0 N
2 1611 [C;D2] 1 C
3 740 [C;D3] 2 C
4 2841 [O;D1] 3 O
5 1589 [O-;D1] 4 O
6 2326 [N;D1]-[C;D2] 0 N 1 C
7 3764 [C;D2](-[C;D3])-[N;D1] 0 N 1 C 2 C
8 2727 [C;D2](-[C;D3]=[O;D1])-[N;D1] 0 N 1 C 2 C 3 O
9 859 [C;D2](-[C;D3](-[O-;D1])=[O;D1])-[N;D1] 0 N 1 C 2 C 3 O 4 O
10 3537 [C;D2](-[C;D3]-[O-;D1])-[N;D1] 0 N 1 C 2 C 4 O
11 3083 [C;D3]-[C;D2] 1 C 2 C
12 793 [C;D2]-[C;D3]=[O;D1] 1 C 2 C 3 O
13 1574 [C;D2]-[C;D3](-[O-;D1])=[O;D1] 1 C 2 C 3 O 4 O
14 3538 [C;D2]-[C;D3]-[O-;D1] 1 C 2 C 4 O
15 1707 [O;D1]=[C;D3] 2 C 3 O
16 1472 [C;D3](-[O-;D1])=[O;D1] 2 C 3 O 4 O
17 1397 [O-;D1]-[C;D3] 2 C 4 O
See also
OEFPPattern class
OEGetFPPatterns
functionOEGetFPType
functionFingerprint Coverage chapter
Fingerprint Overlap chapter