You want to draw a ROC curve to visualize the performance of a binary classification method (see Figure 1).
Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. There are four possible outcomes from a binary classifier (see Figure 2):
In molecule modeling, the positive entities are commonly called actives, while the negative ones are called decoys.
From the above numbers the followings can be calculated:
The receiver operating characteristic (ROC) curve is a two dimensional graph in which the false positive rate is plotted on the X axis and the true positive rate is plotted on the Y axis. The ROC curves are useful to visualize and compare the performance of classifier methods (see Figure 1).
Figure 3 illustrates the ROC curve of an example test set of 18 entities (7 actives, 11 decoys) that are shown in Table 1 in the ascending order of their scores. For a small test set, the ROC curve is actually a stepping function: an active entity in Table 1 moves the line upward, while a decoy moves it to the right.
In this simple example the scores are in the range of [0.0, 1.0], where the lower the score is the better. For different score range the functions have to be modified accordingly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
def GetRates(actives, scores): tpr = [0.0] # true positive rate fpr = [0.0] # false positive rate nractives = len(actives) nrdecoys = len(scores) - len(actives) foundactives = 0.0 founddecoys = 0.0 for idx, (id, score) in enumerate(scores): if id in actives: foundactives += 1.0 else: founddecoys += 1.0 tpr.append(foundactives / float(nractives)) fpr.append(founddecoys / float(nrdecoys)) return tpr, fpr
1 2 3 4 5 6 7
def DepictROCCurve(actives, scores, label, color, fname, randomline=True): plt.figure(figsize=(4, 4), dpi=80) SetupROCCurvePlot(plt) AddROCCurve(plt, actives, scores, color, label) SaveROCCurvePlot(plt, fname, randomline)
1 2 3 4 5
def SetupROCCurvePlot(plt): plt.xlabel("FPR", fontsize=14) plt.ylabel("TPR", fontsize=14) plt.title("ROC Curve", fontsize=14)
1 2 3 4 5
def AddROCCurve(plt, actives, scores, color, label): tpr, fpr = GetRates(actives, scores) plt.plot(fpr, tpr, color=color, linewidth=2, label=label)
1 2 3 4 5 6 7 8 9 10 11
def SaveROCCurvePlot(plt, fname, randomline=True): if randomline: x = [0.0, 1.0] plt.plot(x, x, linestyle='dashed', color='red', linewidth=2, label='random') plt.xlim(0.0, 1.0) plt.ylim(0.0, 1.0) plt.legend(fontsize=10, loc='best') plt.tight_layout() plt.savefig(fname)
prompt > python3 roc2img.py actives.txt scores.txt roc.png
Depicting ROC curves is a good way to visualize and compare the performance of various fingerprint types. The molecule depicted on the left in Table 2 is a random molecule from the TXA2 set (49 structures) of the Briem-Lessel dataset. The graph on the right is generated by performing 2D molecule similarity searches using four of the fingerprint types of GraphSim TK (path, circular, tree and MACCS key). The decoy set is the four other activity classes in the dataset (5HT3, ACE, PAF and HMG-CoA) along with an inactive set of randomly selected compounds from the MDDR not known to be belong to any of the five activity classes.