Depicting CSV or SDF in HTML
Problem
You want to depict molecules along with their associated data read from a
CSV file in an HTML
file.
See the generated HTML
file in drugs.html
and its screen-shot in Figure 1.
Ingredients
|
Difficulty Level
Solution
The CSV file format is a text file format containing comma-separated values. In OEChem TK this file format is implemented to enable data exchange with a wide variety of other software. Each line of a CSV file stores data for a molecule that is represented by a SMILES string.
See also
CSV File Format section of the OEChem TK documentation about the layout of the CSV file format.
When reading a CSV file, the fields of the file are attached to each molecule as SD data. This data can be accessed by the OEGetSDDataIter function that returns an iterator over all the SD data (tag - value) pairs of a molecule. The CollectDataTags function iterates over a list of molecules and returns the unique tags of the data attached to the molecules.
1def CollectDataTags(mollist):
2
3 tags = []
4 for mol in mollist:
5 for dp in oechem.OEGetSDDataIter(mol):
6 if not dp.GetTag() in tags:
7 tags.append(dp.GetTag())
8
9 return tags
The WriteHTMLFile
function takes a list of molecules read from a CSV file along with the
data tags returned by the CollectDataTags
function.
It first writes the header of the html
file,
followed by iterating over the molecules and adding a new row into a table for
each molecule by calling the WriteHTMLTableRow
function.
Finally, it finishes writing the html
file by calling the
WriteHTMLFooter function.
1def WriteHTMLFile(ofp, mollist, iname, tags, opts):
2
3 WriteHTMLHeader(ofp, iname, tags)
4
5 for mol in mollist:
6 WriteHTMLTableRow(ofp, mol, opts, tags)
7
8 WriteHTMLFooter(ofp)
The WriteHTMLHeader function sets
the style of the html
file (lines 5-13) and then adds the header of
a table in which the molecules along with their data will be inserted (lines 21-28).
1def WriteHTMLHeader(ofp, filename, tags):
2
3 tablewidth = min(1800, (len(tags) + 1) * 200)
4
5 ofp.write("<style type='text/css'>\n")
6 ofp.write("h1 "
7 "{ text-align:center; border-width:thick; border-style:double;"
8 "border-color:#BCC; }\n")
9 ofp.write("table.csv "
10 "{ border-spacing:1; background: #FFF; width:%dpx; }\n" % tablewidth)
11 ofp.write("table.csv td, th "
12 "{ width:100px; text-align:center; padding:3px 3px 3px 3px; }\n")
13 ofp.write("table.csv th { height:50px; color:#FFF; background:#788; }\n")
14 ofp.write("table.csv tr:nth-child(even){ color:#000; background:#FFE; }\n")
15 ofp.write("table.csv tr:nth-child(odd) { color:#000; background:#FEF; }\n")
16 ofp.write("table.csv tr:hover { color:#000; background:#DDD; }\n")
17 ofp.write("</style>\n")
18
19 ofp.write("<html><h1>%s</h1>\n" % filename)
20
21 ofp.write("<body>\n")
22 ofp.write("<table class=csv>\n")
23 ofp.write("<tbody>\n")
24
25 ofp.write("<tr>\n")
26 ofp.write("<th> Molecule</th>")
27
28 # write data tags in table header
29
30 for tag in tags:
31 ofp.write("<th> %s </th>" % tag)
32 ofp.write("\n</tr>\n")
The WriteHTMLTableRow function inserts
the image of the molecule (line 7) along with its corresponding data
(lines 11-15) into the next row of the html
table.
1def WriteHTMLTableRow(ofp, mol, opts, tags):
2
3 ofp.write("<tr class=row>\n")
4
5 # add image
6
7 ofp.write("<td> %s \n </td>\n" % GetSVGImage(mol, opts))
8
9 # write data
10
11 for tag in tags:
12 value = "N/A"
13 if oechem.OEHasSDData(mol, tag):
14 value = oechem.OEGetSDData(mol, tag)
15 ofp.write("<td> %s </td>" % value)
16
17 ofp.write("</tr>\n")
The GetSVGImage generates and molecule
display and returns its image as a string in bare svg
image file format (with no header).
This “image” string can be directly inserted into an html
file.
1def GetSVGImage(mol, opts):
2
3 oedepict.OEPrepareDepiction(mol)
4 disp = oedepict.OE2DMolDisplay(mol, opts)
5 imagestr = oedepict.OERenderMoleculeToString("bsvg", disp, False)
6 return imagestr.decode("utf-8")
The WriteHTMLFooter function
simply closes the table and the body of the html
file.
Download code
csv2html.py
and drugs.csv
supporting data file
Usage
Running the above command will generate the
drugs.html
file.
prompt > python3 csv2pptx.py drugs.csv drugs.pptx
Discussion
Reading the columns of an CSV file into SD data fields, means
that the OEChem TK provides a meta-data interchange between sdf
files and
CSV files.
Consequently, the same Python script can be used to generate an html
file
reading an sdf
file.
Usage
After downloading drugs.sdf
supporting
data file, the above command will generate the same
drugs.html
file (apart from the input filename on the top).
prompt > python3 csv2html.py drugs.sdf drugs.html
See also in OEChem TK manual
Theory
SD Tagged Data Manipulation section
CSV File Format section
API
OEGetSDDataIter function
See also in OEDepict TK manual
Theory
Molecule Depiction chapter
API
OE2DMolDisplay class
OE2DMolDisplayOptions class
OEPrepareDepiction function
OERenderMoleculeToString function