Depicting Multiple Matches
Problem
You want to perform multiple substructure searches and highlight the matches on the hit molecules. See example in Table 1.
page 1 |
page 2 |
Ingredients
|
Difficulty Level
Solution
The GetSubstructureSearch function shows how to read MDL query files (OEReadMDLQueryFile) and initialize a OEQMol object by calling the OEBuildMDLQueryExpressions function. This OEQMol object is then used to initialize the OESubSearch object that performs substructure searches. Setting the maximum number of matches to 1 (see line 21) ensures the search will terminate upon finding one match. The GetSubstructureSearch function returns both the query molecule that will be depicted in the top of each page of the report and the substructure search object.
1def GetSubstructureSearch(queryfname):
2
3 qifs = oechem.oemolistream()
4 if not qifs.open(queryfname):
5 oechem.OEThrow.Fatal("Cannot open mdl query file!")
6 if qifs.GetFormat() != oechem.OEFormat_MDL:
7 oechem.OEThrow.Fatal("Query file has to be an MDL file!")
8
9 querymol = oechem.OEGraphMol()
10 if not oechem.OEReadMDLQueryFile(qifs, querymol):
11 oechem.OEThrow.Fatal("Cannot read query molecule!")
12 oedepict.OEPrepareDepiction(querymol)
13
14 qmol = oechem.OEQMol()
15 queryopts = oechem.OEMDLQueryOpts_Default | oechem.OEMDLQueryOpts_SuppressExplicitH
16 oechem.OEBuildMDLQueryExpressions(qmol, querymol, queryopts)
17
18 subsearch = oechem.OESubSearch()
19 if not subsearch.Init(qmol):
20 oechem.OEThrow.Fatal("Cannot initialize substructure search!")
21 subsearch.SetMaxMatches(1)
22
23 return (querymol, subsearch)
The GetSubstructureSearches function iterates over a list of query file names and collects the query molecules and the substructure search objects (returned by the GetSubstructureSearch function) in two separate lists.
1def GetSubstructureSearches(queryfnames):
2
3 querymols = []
4 subsearches = []
5
6 for queryfname in queryfnames:
7 querymol, subsearch = GetSubstructureSearch(queryfname)
8 querymols.append(oechem.OEGraphMol(querymol))
9 subsearches.append(oechem.OESubSearch(subsearch))
10
11 return querymols, subsearches
The DepictMoleculesWithSubstructureMatches function that depicts the hits of the substructure searches takes the following parameters:
- report
The OEReport object that allows the generation of multi-page documentation.
- mollist
The list of target molecules.
- subsearches
The list of OESubSearch objects initialized by the GetSubstructureSearches function.
- opts
The OE2DMolDisplayOptions object that defines the style of the molecule depiction.
- colors
The list of colors that is used to highlight the substructure search matches.
DepictMoleculesWithSubstructureMatches iterates over the target molecules and performs substructure searches by calling the GetSubstructureMatches function that:
returns an empty list, if the target molecule does not contain all substructures or
returns the list of substructure matches, one match for each successful substructure search
In the later case, the molecule (i.e. the hit) is rendered into the next cell of the report and the matched substructures are highlighted by calling the OEAddHighlightOverlay function. The OEAddHighlightOverlay takes all matches being highlighted and colors the overlapped atoms and bonds using the colors by turn. The colors used for highlighting are determined when the OEHighlightOverlayByBallAndStick object is constructed (see line 3).
1def DepictMoleculesWithSubstructureMatches(report, mollist, subsearches, opts, colors):
2
3 highlight = oedepict.OEHighlightOverlayByBallAndStick(colors)
4
5 for mol in mollist:
6
7 matches = GetSubstructureMatches(subsearches, mol)
8 if len(matches) == 0: # at least one substructure search fails
9 continue
10
11 oedepict.OEPrepareDepiction(mol)
12 disp = oedepict.OE2DMolDisplay(mol, opts)
13 oedepict.OEAddHighlightOverlay(disp, highlight, matches)
14
15 cell = report.NewCell()
16 oedepict.OERenderMolecule(cell, disp)
17 oedepict.OEDrawCurvedBorder(cell, oedepict.OELightGreyPen, 20)
The GetSubstructureMatches function iterates over the substructure searches. If a substructure search fails (see lines 7-8) then the function returns an empty list. After each successful search the match list is appended with a new match stored in an OEAtomBondSet object.
1def GetSubstructureMatches(subsearches, mol):
2
3 unique = True
4 matches = []
5 for ss in subsearches:
6 miter = ss.Match(mol, unique)
7 if not miter.IsValid():
8 return []
9 else:
10 match = miter.Target()
11 matches.append(oechem.OEAtomBondSet(match.GetTargetAtoms(), match.GetTargetBonds()))
12
13 return matches
After generating the report with the substructure search matches the queries can be depicted on each page of the report. The DepictQueries function iterates over the headers of the OEReport object and depicts each query in a row that is generated using an OEImageGrid object. A border is drawn around each query molecule with its associated color to aid finding the corresponding substructure matches in the hit molecules.
1def DepictQueries(report, queries, colors):
2
3 for header in report.GetHeaders():
4
5 grid = oedepict.OEImageGrid(header, 1, len(queries))
6 grid.SetCellGap(4)
7 cellwidth, cellheight = grid.GetCellWidth(), grid.GetCellHeight()
8 opts = oedepict.OE2DMolDisplayOptions(cellwidth, cellheight, oedepict.OEScale_AutoScale)
9
10 colors.ToFirst()
11 for cell, query, color in zip(grid.GetCells(), queries, colors):
12 disp = oedepict.OE2DMolDisplay(query, opts)
13 oedepict.OERenderMolecule(cell, disp)
14 pen = oedepict.OEPen(oechem.OEWhite, color, oedepict.OEFill_Off, 4.0)
15 oedepict.OEDrawCurvedBorder(cell, pen, 20)
Download code
mdlsearches2pdf.py
and supporting data files:
query-A.mol
,
query-B.mol
,
query-C.mol
, and
targets.ism
Usage:
prompt > python3 mdlseacrhes2pdf.py -queries query-A.mol query-B.mol query-C.mol -target targets.ism -report matches.pdf
Discussion
Using colors with high contrast is recommended when highlighting overlapping matches by the OEAddHighlightOverlay function. In this example the colors returned by the OEGetContrastColors function are used.
Even though there is no limit on the number of overlapping patterns that can be highlighted simultaneously by the OEAddHighlightOverlay function, attempting to highlight too many patterns will result in a complex image that will be difficult to visually interpret (see example in Figure 2).
See also in OEChem TK manual
Theory
API
OESubSearch class
OEGetContrastColors function
See also in OEDepict TK manual
Theory
Molecule Depiction chapter
Highlighting Overlapped Patterns section
API
OE2DMolDisplay class
OE2DMolDisplayOptions class
OEAddHighlightOverlay function
OEImage class
OEImageGrid class
OEPrepareDepiction function
OERenderMolecule function
OEReport class