Visualizing Shape and Color Overlap
Problem
You want to visualize the output of OpenEye’s ROCS application by depicting shape, color similarity between a multi-conformer reference molecule and set of fit molecules. See Table 1.
ROCS is a tool for aligning and scoring a database of molecules to a reference (i.e. query) molecule. The scores are used to rank molecules based on the probability that they share relevant (biological) properties with the reference molecule. ROCS aligns molecules based on shape similarity and their distributions of chemical features (also referred as color atoms). ROCS outputs a file of the best alignment and scores for each of the database molecules to the reference molecule.
While the Python script of this recipe is designed to visualize the output of the OpenEye’s ROCS application, it can be easily modified to depict any set of molecules that are pre-aligned to a reference molecule.
Along with share and color overlays, the script can also depict the 2D graph similarity between the reference molecule and the set of fit molecules. This helps to easily identify molecules with high shape and color similarity but with a novel 2D molecular graph (i.e. with low 2D similarity scores). See examples in Table 2.
page 1 |
page 2 |
page 3 |
Ingredients
|
Difficulty level
Download
Solution
The main function of the script is depict_shape_color_graphsim_overlaps. After some fingerprint generation and depiction style setup, the function loops over all fit molecules and determines which conformation of the reference corresponds to each fit molecule (line 46-50). In the case of a multi-conformer reference molecule, a single conformation can not be used as the reference, otherwise the generated shape and color 2D depictions would not accurately represent the corresponding 3D information.
In the multi-page report a following images are generated for each fit molecule in the dataset:
Score diagram (OEDrawROCSScores) and reference molecule display (OERenderShapeQuery).
Shape overlap between the reference and the fit molecule (OERenderShapeOverlap) with a pre-calculated shape Tanimoto score.
Color overlap between the reference and the fit molecule (OERenderColorOverlap) with a pre-calculated color Tanimoto score.
2D similarity (if requested) between the reference and the fit molecule (line 80-84). with Tree fingerprint similarity score calculated on-the-fly.
1def depict_shape_color_graphsim_overlaps(report, refmol, refmol_displays,
2 fitmols, depictsim):
3 """
4 Depict shape, color, and 2D similarities.
5
6 :type report: oedepict.OEReport
7 :type refmol: oechem.OEMol
8 :type refmol_displays: dict[string, oegrapheme.OEShapeQueryDisplay]
9 :type fitmols: list[oechem.OEMol]
10 :type depictsim: boolean
11 """
12
13 fptag, fptype, refmolfp, bondglyph = None, None, None, None
14 if depictsim:
15 fptag = oechem.OEGetTag("fpoverlap")
16 fptype = oegraphsim.OEGetFPType(oegraphsim.OEFPType_Tree)
17 print("Using fingerprint type %s" % fptype.GetFPTypeString())
18 refmolfp = oegraphsim.OEFingerPrint()
19 oegraphsim.OEMakeFP(refmolfp, refmol, fptype)
20 fpcolorg = get_fingerprint_colorgradient(get_max_bond_self_similarity_score(refmol, fptype))
21 bondglyph = ColorBondByOverlapScore(fpcolorg, fptag)
22
23 sopts = get_shape_overlap_display_options()
24 copts = get_color_overlap_display_options()
25
26 ftableopts = get_fit_table_options(depictsim)
27 rtableopts = get_ref_table_options()
28
29 scorefont = oedepict.OEFont(oedepict.OEFontFamily_Default, oedepict.OEFontStyle_Bold, 9,
30 oedepict.OEAlignment_Center, oechem.OEBlack)
31
32 tracer = oechem.OEConsoleProgressTracer()
33 tracer.SetTask("Generating overlays")
34
35 for fitidx, fitmol in enumerate(fitmols):
36
37 tracer.SetProgress(fitidx, len(fitmols))
38
39 if not oechem.OEHasSDData(fitmol, "ROCS_ShapeQuery"):
40 warning = "Shape query reference is not available for molecule '{}'"
41 oechem.OEThrow.Warning(warning.format(fitmol.GetTitle()))
42 continue
43
44 reftitle = oechem.OEGetSDData(fitmol, "ROCS_ShapeQuery")
45
46 if reftitle not in refmol_displays:
47 warning = "Shape query reference '{}'' is not valid for molecule '{}'"
48 oechem.OEThrow.Warning(warning.format(reftitle, fitmol.GetTitle()))
49 continue
50 refdisp = refmol_displays[reftitle]
51
52 cell = report.NewCell()
53 fittable = oedepict.OEImageTable(cell, ftableopts)
54
55 # title + score graph + query
56 maintitle = "Hit: {}".format(fitmol.GetTitle())
57 fittable.DrawText(fittable.GetCell(1, 1), maintitle)
58
59 reftable = oedepict.OEImageTable(fittable.GetCell(2, 1), rtableopts)
60
61 reftable.DrawText(reftable.GetCell(1, 1), "Rank: {}".format(fitidx+1))
62 render_score(reftable.GetCell(2, 1), fitmol, "ROCS_TanimotoCombo", "Tanimoto Combo", scorefont)
63
64 simscore = None if not depictsim else calc_fingerprint_similarity(refmol, refmolfp, fitmol, fptype, fptag)
65 render_score_radial(reftable.GetCell(3, 1), fitmol, simscore)
66
67 oegrapheme.OERenderShapeQuery(reftable.GetCell(4, 1), refdisp)
68 reftable.DrawText(reftable.GetCell(5, 1), "query : {}".format(reftitle))
69
70 odisp = oegrapheme.OEShapeOverlapDisplay(refdisp, fitmol, sopts, copts)
71
72 # shape overlap
73 render_score(fittable.GetHeaderCell(1), fitmol, "ROCS_ShapeTanimoto", "Shape Tanimoto", scorefont)
74 oegrapheme.OERenderShapeOverlap(fittable.GetCell(2, 2), odisp)
75
76 # color overlap
77 render_score(fittable.GetHeaderCell(2), fitmol, "ROCS_ColorTanimoto", "Color Tanimoto", scorefont)
78 oegrapheme.OERenderColorOverlap(fittable.GetCell(2, 3), odisp)
79
80 # 2D similarity
81 if depictsim:
82 simtitle = "2D Graph Tanimoto = {:4.3f}".format(simscore)
83 oedepict.OEDrawTextToCenter(fittable.GetHeaderCell(3), simtitle, scorefont)
84 depict_molecule_similarity(fittable.GetCell(2, 4), fitmol, refdisp, bondglyph, fptag)
85
86 tracer.Pop()
Usage
Usage
shapeoverlap2pdf.py
and ROCS result file:
4cox_rocs_hits.oeb.gz
The following command will generate the images shown in Table 1.
prompt > python3 shapeoverlap2pdf.py -in 4cox_rocs_hits.oeb.gz -out test.pdf -maxhits 12
The following command will generate the images shown in Table 2.
prompt > python3 shapeoverlap2pdf.py -in 4cox_rocs_hits.oeb.gz -out test.pdf -maxhits 12 -depictsim
page 1 |
page 2 |
page 3 |
Command Line Parameters
/Users/krisztina/cookbook/oepycookbook/recipes/visualization/shapeoverlap2pdf.py : [-in] <input> [-out] <output pdf>
Simple parameter list
input/output options:
-in : Input molecule filename
-out : Output image filename
general options:
-depictsim : Calculate and depict 2D molecule similarity
-maxhits : Maximum number of hits depicted
report options
-pagebypage : Write individual numbered separate pages
Discussion
The aim of this script is to represent 3D overlays, see example in Figure 1, in a more comprehensible 2D layout that enables fast comparison of hit molecules. See corresponding 2D depiction in Figure 3.
For each fit molecule the corresponding query molecule is also depicted in the first column. The layout and orientation of query molecule can vary since the generation of the 2D coordinates are driven by the 3D coordinates of best matching conformation of the 3D query. See the layout difference of the query molecule in the 2nd and 3rd row on page 1 in Table 1.
See also
OEPrepareDepictionFrom3D function in the OEDepict TK manual
OEGrapheme TK uses colors to mark different chemical features detected in molecules.
When the reference molecule is depicted, see examples in Table 3, colored circles indicate the detected chemical features on the molecular graph. If two color atoms occupy the same space in 3D, they are represented with adjacent half circles on the molecular graph.
When visualizing chemical features matches between the reference and the fit molecule, each circle on the fit molecule (see the 3rd image from the left in Figure 3), corresponds to a color atom in the reference molecule. The color of the circle indicates the fitness of the color atom match in 3D. The lighter the color, the smaller the overlap between the reference and fit color atoms in 3D. Unfilled circles represent unmatched reference color atoms. If there is a good color atom match exist for a reference color atom in 3D, then the circle representing the color atom is positioned to the matching fit color atom in 2D.
The shape overlap between the reference and the fit molecule (see the 2nd image from the left in Figure 3) is visualized by drawing circles underneath the molecule, darker color indicates good 3D shape overlap between the reference and the fit molecules. Additionally, clashes between the molecular graph of the fit molecule and 2D molecule surface of the reference structure imply shape mismatch in 3D.
The 4th depiction in Figure 3 shows the 2D graph similarity between the reference and the fit molecule. See Depicting Molecule Similarity Based on Fingerprints recipe for more details about the calculation and interpretation of these images.
All scores (shape, color and 2D similarity) are in the range of [0.0, 1.0]
.
These scores are visualized in a radials graph for easy interpretation and
comparison.
See example in Figure 4.
See also in OEChem TK manual
Theory
Generic Data chapter
API
OELinearColorGradient class
See also in GraphSim TK manual
Theory
Fingerprint Generation chapter
Similarity Measures chapter
Fingerprint Overlap chapter
API
OEFingerPrint class
OEGetFPOverlap function
OEMakeFP function
OETanimoto function
See also in Shape TK manual
Theory
API
OEColorForceField class
See also in OEDepict TK manual
Theory
Molecule Depiction chapter
API
OE2DMolDisplay class
OE2DMolDisplayOptions class
OEDepictOrientation namespace
OEDrawTextToCenter function
OEImage class
OEImageTable class
OEImageTableOptions class
OEImageTableStyle namespace
OEPrepareDepiction function
OERenderMolecule function
OEReport class
OEReportOptions class
OEWriteReport function
See also in GraphemeTM TK manual
Theory
Annotating Atoms and Bonds chapter
API
OEAddGlyph function
OEBondGlyphBase abstract base class
OEColorForceFieldDisplay class
OEDrawColorForceFieldLegend function
OEDrawROCSScores function
OEGetMoleculeSurfaceScale function
OEPrepareAlignedDepictionFrom3D function
OERenderColorOverlap function
OERenderShapeOverlap function
OERenderColorOverlap function
OERenderShapeQuery function
OEShapeOverlapDisplay class
OEShapeOverlapDisplayStyle namespace
OEShapeQueryDisplay class