Depicting CSV or SDF in PDF

Problem

You want to depict molecules along with their associated data read from a CSV file in a multi-page PDF file. See example in drugs.pdf and in Table 1.

Table 1. Example of depiction of CSV in PDF (The pages are reduced here for visualization convenience)
page 1 page 2
../_images/csv2pdf-page-1.png ../_images/csv2pdf-page-2.png

Ingredients

Note

Requires OpenEye toolkits version 2014.Feb or later.

Difficulty Level

../_images/chilly1.png ../_images/chilly1.png

Solution

The CSV file format is a text file format containing comma-separated values. In OEChem TK this file format is implemented to enable data exchange with a wide variety of other software. Each line of a CSV file stores data for a molecule that is represented by a SMILES string.

See also

When reading a CSV file, the fields of the file are attached to each molecule as SD data. This data can be accessed by the OEGetSDDataIter function that returns an iterator over all the SD data (tag - value) pairs of a molecule. The CollectDataTags function iterates over a list of molecules and returns the unique tags of the data attached to the molecules.

1
2
3
4
5
6
7
8
9
def CollectDataTags(mollist):

    tags = []
    for mol in mollist:
        for dp in OEGetSDDataIter(mol):
            if not dp.GetTag() in tags:
                tags.append(dp.GetTag())

    return tags

The DepictMoleculesWithData function takes a list of molecules read from a CSV file along with the data tags returned by the CollectDataTags function. Each molecule and its corresponding data is rendered into adjacent cells of an OEReport object (lines 3-16). The OEReport class is a layout manager allowing generation of multi-page images in a convenient way. After rendering the molecules, the input filename is rendered into page headers (lines 22-27) while the page number is rendered at the bottom of each page (lines 31-36).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def DepictMoleculesWithData(report, mollist, iname, tags, opts):

    for mol in mollist:

        # render molecule

        cell = report.NewCell()
        OEPrepareDepiction(mol)
        disp = OE2DMolDisplay(mol, opts)
        OERenderMolecule(cell, disp)
        OEDrawBorder(cell, OEPen(OELightGrey, OELightGrey, OEFill_Off, 1.0))

        # render corresponding data

        cell = report.NewCell()
        RenderData(cell, mol, tags)

    # add input filnename to headers

    headerfont = OEFont(OEFontFamily_Default, OEFontStyle_Default,
                        12, OEAlignment_Center, OEBlack)
    headerpos = OE2DPoint(report.GetHeaderWidth() / 2.0, report.GetHeaderHeight() / 2.0)

    for header in report.GetHeaders():
        header.DrawText(headerpos, iname, headerfont)

    # add page number to footers

    footerfont = OEFont(OEFontFamily_Default, OEFontStyle_Default,
                        12, OEAlignment_Center, OEBlack)
    footerpos = OE2DPoint(report.GetFooterWidth() / 2.0, report.GetFooterHeight() / 2.0)

    for pageidx, footer in enumerate(report.GetFooters()):
        footer.DrawText(footerpos, "- %d -" % (pageidx + 1), footerfont)

The RenderData function shows how to render (tag - value) tuples into two adjacent OEImageGrid objects.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def RenderData(image, mol, tags):

    data = []
    for tag in tags:
        value = "N/A"
        if OEHasSDData(mol, tag):
            value = OEGetSDData(mol, tag)
        data.append((tag, value))

    nrdata = len(data)
    imagew, imageh = image.GetWidth(), image.GetHeight()

    # generating grid for tags

    tframe = OEImageFrame(image, imagew * 0.30, imageh, OE2DPoint(0.0, 0.0))
    tgrid = OEImageGrid(tframe, max(nrdata, 12), 1)
    tfont = OEFont(OEFontFamily_Default, OEFontStyle_Bold,
                   8, OEAlignment_Left, OEBlack)
    tpos = OE2DPoint(5.0, tgrid.GetCellHeight() / 2.0)

    # generating grid for values

    vframe = OEImageFrame(image, imagew * 0.70, imageh, OE2DPoint(imagew * 0.30, 0.0))
    vgrid = OEImageGrid(vframe, max(nrdata, 12), 1)
    vfont = OEFont(OEFontFamily_Default, OEFontStyle_Default,
                   8, OEAlignment_Left, OEBlack)
    vpos = OE2DPoint(5.0, vgrid.GetCellHeight() / 2.0)

    # rendering (tag - value) data

    for idx, (tag, value) in enumerate(data):
        cell = tgrid.GetCell(idx + 1, 1)
        cell.DrawText(tpos, tag + ":", tfont, cell.GetWidth())
        cell = vgrid.GetCell(idx + 1, 1)
        cell.DrawText(vpos, value, vfont, cell.GetWidth())

Download code

csv2pdf.py and drugs.csv supporting data file

Running the above command will generate the drugs.pdf multi-page pdf file.

Usage:

prompt > python3 csv2pdf.py drugs.csv drugs.pdf

Discussion

Note

Reading the columns of an CSV file into SD data fields, means that the OEChem TK provides a meta-data interchange between sdf files and CSV files. Consequently, the same python script can be used to generate a pdf file reading an sdf file.

Download

drugs.sdf supporting data file

Running the above command will generate the same drugs.pdf multi-page pdf file (apart from the input filename on each page header).

prompt > python3 csv2pdf.py drugs.sdf drugs.pdf

See Also in OEChem TK Manual

Theory

API

See Also in OEDepict Manual

Theory

API