Depicting CSV or SDF in PPTX (PowerPoint)

Problem

You want to depict molecules along with their associated data read from a CSV file in an pptx PowerPoint file. See example in drugs.pptx and in Table 1.

Table 1. Example of depiction of CSV in PPTX (The slides are reduced here for visualization convenience)
slide 1 slide 2 slide 3 slide 4
../_images/csv2pptx-slide-01.png ../_images/csv2pptx-slide-02.png ../_images/csv2pptx-slide-03.png ../_images/csv2pptx-slide-04.png

Ingredients

Difficulty Level

../_images/chilly1.png ../_images/chilly1.png

Solution

The CSV file format is a text file format containing comma-separated values. In OEChem TK this file format is implemented to enable data exchange with a wide variety of other software. Each line of a CSV file stores data for a molecule that is represented by a SMILES string.

See also

When reading a CSV file, the fields of the file are attached to each molecule as SD data. This data can be accessed by the OEGetSDDataIter function that returns an iterator over all the SD data (tag - value) pairs of a molecule. The CollectDataTags function iterates over a list of molecules and returns the unique tags of the data attached to the molecules.

1
2
3
4
5
6
7
8
9
def CollectDataTags(mollist):

    tags = []
    for mol in mollist:
        for dp in oechem.OEGetSDDataIter(mol):
            if not dp.GetTag() in tags:
                tags.append(dp.GetTag())

    return tags

The WritePPTXFile function takes a list of molecules read from a CSV file along with the data tags returned by the CollectDataTags function.

First a new ‘presentation’ is created with a first slide showing the name of the input file. Then iterating over the molecules, each molecule is depicted in a new slide along with the corresponding data by calling. the RenderData function. In order to add the images, a temporary image file has to be generated for each molecule. These files can be removed after the presentation is saved.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def WritePPTXFile(oname, mollist, iname, tags, opts):

    # create new PowrPoint presentation

    pres = pptx.Presentation()

    # add first slide

    title_slide_layout = pres.slide_layouts[0]
    title_slide = pres.slides.add_slide(title_slide_layout)
    title = title_slide.shapes.title
    title.text = os.path.basename(iname)

    tmpfnames = []

    # create a new slide for each molecule

    for idx, mol in enumerate(mollist):
        slide_layout = pres.slide_layouts[5]
        slide = pres.slides.add_slide(slide_layout)

        if mol.GetTitle():
            title = slide.shapes.title
            title.text = mol.GetTitle()

        fname = "tmp%d.png" % idx
        WriteImageToFile(fname, mol, opts)
        slide.shapes.add_picture(fname, left=Inches(1.0), top=Inches(2.0), width=Inches(2.5))
        tmpfnames.append(fname)

        RenderData(slide, mol, tags)

    pres.save(oname)

    # remove temporary image files

    for fname in tmpfnames:
        os.remove(fname)

The WriteImageToFile function generates a molecule depiction and writes it into an image file.

1
2
3
4
5
6
7
8
def WriteImageToFile(fname, mol, opts):

    image = oedepict.OEImage(opts.GetWidth(), opts.GetHeight())
    oedepict.OEPrepareDepiction(mol)
    disp = oedepict.OE2DMolDisplay(mol, opts)
    oedepict.OERenderMolecule(image, disp, False)
    oedepict.OEDrawCurvedBorder(image, oedepict.OELightGreyPen, 10.0)
    oedepict.OEWriteImage(fname, image)

The RenderData function generates a new table and adds each (tag - value) tuples into separate rows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def RenderData(slide, mol, tags):

    data = []
    for tag in tags:
        value = "N/A"
        if oechem.OEHasSDData(mol, tag):
            value = oechem.OEGetSDData(mol, tag)
        data.append((tag, value))

    rows, cols = len(data), 2
    table = slide.shapes.add_table(rows, cols, left=Inches(4.0), top=Inches(2.0),
                                   width=Inches(5.5), height=Inches(0.8)).table

    table.columns[0].width = Inches(2.0)
    table.columns[1].width = Inches(3.5)
    table.first_row = False

    for row, (tag, value) in enumerate(data):
        table.cell(row, 0).text = tag + ':'
        table.cell(row, 1).text = value

Download code

csv2pptx.py and drugs.csv supporting data

Usage

Running the above command will generate the drugs.pptx file.

prompt > python3 csv2pptx.py drugs.csv drugs.pptx

Discussion

Reading the columns of an CSV file into SD data fields, means that the OEChem TK provides a meta-data interchange between sdf files and CSV files. Consequently, the same Python script can be used to generate a pptx file reading an sdf file.

Usage

After downloading drugs.sdf supporting data file, the above command will generate the same drugs.pptx file (apart from the input filename on the first slide).

prompt > python3 csv2pptx.py drugs.sdf drugs.pptx

See also in OEChem TK manual

Theory

API

See also in OEDepict TK manual

Theory

API