Depicting Molecule Similarity Based on Fingerprints¶
Problem¶
You want to depict the 2D similarity of two molecules based on their fingerprints. See example in Figure 1.
Ingredients¶
|
Difficulty level¶
Solution¶
The GraphSim TK not only provides functionality to encode 2D molecular graph information into fingerprints, but it also gives access to the fragments that are being enumerated during the fingerprint generation process. The OEGetFPOverlap function, used in this example, returns all common fragments found between two molecules based on the given fingerprint type. These fragments are used in the SetFingerPrintSimilarity function to assess the similar parts of two molecules. Iterating over the bonds of the common fragments, the occurrence of each bond is counted and used as an overlap score (lines 6-10). These scores are then attached to the corresponding bonds as generic data (lines 15-18). The maximum overlap score is also calculated and returned by the function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def SetFingerPrintSimilarity(qmol, tmol, fptype, tag, maxvalue=0):
qbonds = oechem.OEUIntArray(qmol.GetMaxBondIdx())
tbonds = oechem.OEUIntArray(tmol.GetMaxBondIdx())
for match in oegraphsim.OEGetFPOverlap(qmol, tmol, fptype):
for bond in match.GetPatternBonds():
qbonds[bond.GetIdx()] += 1
for bond in match.GetTargetBonds():
tbonds[bond.GetIdx()] += 1
maxvalue = max(maxvalue, max(qbonds))
maxvalue = max(maxvalue, max(tbonds))
for bond in qmol.GetBonds():
bond.SetData(tag, qbonds[bond.GetIdx()])
for bond in tmol.GetBonds():
bond.SetData(tag, tbonds[bond.GetIdx()])
return maxvalue
|
These bond overlap scores can be used to highlight the similar and dissimilar parts of the molecules. The ColorBondByOverlapScore bond annotation class takes a linear color gradient and draws a “stick” underneath each bond (lines 20-24). The color of the “stick” is determined by the overlap score of the bond (lines 17-18).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | class ColorBondByOverlapScore(oegrapheme.OEBondGlyphBase):
def __init__(self, cg, tag):
oegrapheme.OEBondGlyphBase.__init__(self)
self.colorg = cg
self.tag = tag
def RenderGlyph(self, disp, bond):
bdisp = disp.GetBondDisplay(bond)
if bdisp is None or not bdisp.IsVisible():
return False
if not bond.HasData(self.tag):
return False
linewidth = disp.GetScale() / 3.0
color = self.colorg.GetColorAt(bond.GetData(self.tag))
pen = oedepict.OEPen(color, color, oedepict.OEFill_Off, linewidth)
adispB = disp.GetAtomDisplay(bond.GetBgn())
adispE = disp.GetAtomDisplay(bond.GetEnd())
layer = disp.GetLayer(oedepict.OELayerPosition_Below)
layer.DrawLine(adispB.GetCoords(), adispE.GetCoords(), pen)
return True
def ColorBondByOverlapScore(self):
return ColorBondByOverlapScore(self.colorg, self.tag).__disown__()
|
The DepictMoleculeOverlaps shows how to depict the 2D similarity of the two molecules:
- Calculate the bond overlap scores for both molecules, an OELinearColorGradient object is constructed that is used by the ColorBondByOverlapScore class to annotate the bonds based on their overlap score (lines 6-10).
- Prepare both molecules for depiction. The target molecule is aligned to the query by calling the OEPrepareMultiAlignedDepiction function (lines 12-14). The OEGetFPOverlap function is utilized to return all common fragments found between two molecules based on a given fingerprint type. These common fragments reveal the similar parts of the two molecules being compared that are used by the OEPrepareMultiAlignedDepiction function to find the best alignment between the molecules.
- Divide the image into two cells using the OEImageGrid class, and the molecules are rendered next to each other (lines 16-30).
- Generate fingerprints and calculate the similarity score calling the OETanimoto function (lines 32-38).
- Render the score into the image (lines 40-42).
You can see the result in Figure 1. The DepictMoleculeOverlaps function uses “yellow to dark green” linear color gradient. Where there is 2D similarity detected between the two molecules, the color green is used to highlight the bonds and the color gets darker with increasing similarity. The color pink is used to highlight parts of the molecules that are not sharing any common fragments i.e. they are 2D dissimilar.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | def DepictMoleculeOverlaps(image, qmol, tmol, fptype, opts):
tag = oechem.OEGetTag("fpoverlap")
maxvalue = SetFingerPrintSimilarity(qmol, tmol, fptype, tag)
colorg = oechem.OELinearColorGradient()
colorg.AddStop(oechem.OEColorStop(0.0, oechem.OEPinkTint))
colorg.AddStop(oechem.OEColorStop(1.0, oechem.OEYellow))
colorg.AddStop(oechem.OEColorStop(maxvalue, oechem.OEDarkGreen))
bondglyph = ColorBondByOverlapScore(colorg, tag)
oedepict.OEPrepareDepiction(qmol)
overlaps = oegraphsim.OEGetFPOverlap(qmol, tmol, fptype)
oedepict.OEPrepareMultiAlignedDepiction(tmol, qmol, overlaps)
grid = oedepict.OEImageGrid(image, 1, 2)
grid.SetMargin(oedepict.OEMargin_Bottom, 10)
opts.SetDimensions(grid.GetCellWidth(), grid.GetCellHeight(), oedepict.OEScale_AutoScale)
opts.SetAtomColorStyle(oedepict.OEAtomColorStyle_WhiteMonochrome)
molscale = min(oedepict.OEGetMoleculeScale(qmol, opts),
oedepict.OEGetMoleculeScale(tmol, opts))
opts.SetScale(molscale)
qdisp = oedepict.OE2DMolDisplay(qmol, opts)
oegrapheme.OEAddGlyph(qdisp, bondglyph, oechem.IsTrueBond())
oedepict.OERenderMolecule(grid.GetCell(1, 1), qdisp)
tdisp = oedepict.OE2DMolDisplay(tmol, opts)
oegrapheme.OEAddGlyph(tdisp, bondglyph, oechem.IsTrueBond())
oedepict.OERenderMolecule(grid.GetCell(1, 2), tdisp)
qfp = oegraphsim.OEFingerPrint()
oegraphsim.OEMakeFP(qfp, qmol, fptype)
tfp = oegraphsim.OEFingerPrint()
oegraphsim.OEMakeFP(tfp, tmol, fptype)
score = oegraphsim.OETanimoto(qfp, tfp)
font = oedepict.OEFont(oedepict.OEFontFamily_Default, oedepict.OEFontStyle_Default, 16,
oedepict.OEAlignment_Center, oechem.OEBlack)
center = oedepict.OE2DPoint(image.GetWidth() / 2.0, image.GetHeight() - 10)
image.DrawText(center, "Tanimoto score = %.3f" % score, font)
|
Download code
Usage:
prompt > python3 simcalc2img.py -query query.mol -target target.mol -out similarity.png
Discussion¶
Hint
Visualizing similarity of two molecules based on their fingerprints provides insight into molecule similarity beyond a single numerical score and reveals information about the underlying fingerprint methods.
The images in the following tables illustrate how changing a core or a terminal atom in a molecule effects the Tanimoto similarity scores.
Path | Tree | Circular |
Path | Tree | Circular |
The example above shows how to visualize the molecule similarity of two molecule, however you might want to visualize the similarity between one ‘query’ molecule against a set of ‘target’ molecules. The example below reads a pre-generated binary fingerprint file (see more details in Rapid Similarity Searching of Large Molecule Files) and then generates a multi-page documentation depicting the most similar hits aligned to the ‘query molecule’
page 1 | page 2 |
Download code
Usage:
prompt3 > simcalc2pdf.py -query query.ism -molfname targets.ism -fpdbfname targets.fpbin -out simcalc.pdf
See also in GraphSim TK manual¶
Theory
- Fingerprint Generation chapter
- Similarity Measures chapter
- Fingerprint Overlap chapter
API
- OEFingerPrint class
- OEGetFPOverlap function
- OEMakeFP function
- OETanimoto function
See also in OEDepict TK manual¶
Theory
- Molecule Depiction chapter
- Molecule Alignment Based on Molecular Similarity chapter
API
- OE2DMolDisplay class
- OE2DMolDisplayOptions class
- OEImage class
- OEImageGrid class
- OEPrepareDepiction function
- OEPrepareMultiAlignedDepiction function
- OERenderMolecule function
See also in GraphemeTM TK manual¶
Theory
- Annotating Atoms and Bonds chapter
API
- OEAddGlyph function
- OEBondGlyphBase abstract base class