Fingerprint Database

The following four examples perform the same task, detailed below:

  1. reading a query structure

  2. printing out the similarity score between the fingerprint of this query and the fingerprint generated for each molecule read from a database file.

In Listing 9, after importing the query structure and generating its path fingerprint, the program loops over the database file creating a path fingerprint for each structure. Then the program calculates the Tanimoto similarity between the fingerprint of the query and the database entry by calling the OETanimoto function.

Listing 9: Similarity calculation from file

public class SimCalcFromFile {

    public static void main(String argv[]) {
        if (argv.length != 2)
            oechem.OEThrow.Usage("SimCalcFromFile <queryfile> <targetfile>");

        oemolistream ifs = new oemolistream();
        if (!ifs.open(argv[0]))
            oechem.OEThrow.Fatal("Unable to open " + argv[0] + " for reading");

        OEGraphMol qmol = new OEGraphMol();
        if (!oechem.OEReadMolecule(ifs, qmol))
            oechem.OEThrow.Fatal("Unable to read query molecule");
        OEFingerPrint qfp = new OEFingerPrint();
        oegraphsim.OEMakeFP(qfp, qmol, OEFPType.Path);

        if (!ifs.open(argv[1]))
            oechem.OEThrow.Fatal("Unable to open " + argv[1] + " for reading");

        OEFingerPrint tfp = new OEFingerPrint();
        OEGraphMol tmol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, tmol)) {
            oegraphsim.OEMakeFP(tfp, tmol, OEFPType.Path);
            System.out.format("%.3f\n", oegraphsim.OETanimoto(qfp, tfp));
        }
        ifs.close();
    }
}

In Listing 10 only the code block that is different from Listing 9 is shown.

In this example, it is assumed that the fingerprints are pre-calculated and stored in an OEB binary file as generic data attached to the corresponding molecules. The program loops over the file and accesses the pre-generated fingerprints or calculates them if they are not available.

The obvious advantage of this process is that the fingerprints one have to be generated once when the binary file is created. This can be significantly faster, than generating the fingerprints on-the-fly every time the program is executed.

See also

The Storage and Retrieval section shows an example of how to generate an OEB binary file which stores molecule along with their corresponding fingerprints.

Listing 10: Similarity calculation from OEB file

        OEFingerPrint tfp = new OEFingerPrint();
        OEGraphMol tmol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, tmol)) {
            if (tmol.HasData("PATH_FP")) {
                tfp = oegraphsim.OEGetFP(tmol, "PATH_FP");
            } else {
                oechem.OEThrow.Warning("Unable to access fingerprint for"+tmol.GetTitle());
                oegraphsim.OEMakeFP(tfp, tmol, OEFPType.Path);
            }
            System.out.format("%.3f\n", oegraphsim.OETanimoto(qfp, tfp));
        }
        ifs.close();

Listing 11 differs from Listing 9 in that it uses an OEFPDatabase object to store the generated fingerprints. The OEFPDatabase class is designed to perform in-memory fingerprint searches.

Listing 11: Similarity calculation with fingerprint database from file

        OEFPDatabase fpdb = new OEFPDatabase(qfp.GetFPTypeBase());
        OEGraphMol tmol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, tmol))
            fpdb.AddFP(tmol);
        ifs.close();

        for (OESimScore score : fpdb.GetScores(qfp))
            System.out.format("%.3f\n", score.GetScore());

After building the fingerprint database, the scores can be accessed by the OEFPDatabase.GetScores method. This will return an iterator over the similarity scores calculated.

Note

The OEFPDatabase only stores fingerprints and not the molecules from which they are generated. A correspondence between a molecule and its fingerprint stored in the database can be established by using the index returned by the OEFPDatabase.AddFP method.

See also

Listing 13 shows how to keep track of the correspondence between a fingerprint added to a OEFPDatabase object and a molecule from which it is calculated.

In the last example (Listing 12), OEFPDatabase is used again to store the fingerprints. If the fingerprint is read from the OEB input binary file, then it is directly added to the database, otherwise the fingerprint is generated on-the-fly when passing the OEMolBase molecule itself to the OEFPDatabase.AddFP method.

Listing 12: Similarity calculation with fingerprint database from OEB

        OEFPDatabase fpdb = new OEFPDatabase(qfp.GetFPTypeBase());
        OEGraphMol tmol = new OEGraphMol();
        OEFingerPrint tfp = new OEFingerPrint();
        while (oechem.OEReadMolecule(ifs, tmol)) {
            if (tmol.HasData("PATH_FP")) {
                tfp = oegraphsim.OEGetFP(tmol, "PATH_FP");
                fpdb.AddFP(tfp);
            } else {
                oechem.OEThrow.Warning("Unable to access fingerprint for" + tmol.GetTitle());
                fpdb.AddFP(tmol);
            }
        }
        ifs.close();
        for (OESimScore score : fpdb.GetScores(qfp))
            System.out.format("%.3f\n", score.GetScore());

Searching with User-defined Similarity Measures

By default, the Tanimoto similarity is used when calling either the OEFPDatabase.GetScores method or the OEFPDatabase.GetSortedScores method. The user can set other types of similarity measures to be applied by calling the OEFPDatabase.SetSimFunc method with a value from the OESimMeasure namespace. Each of the constants from this namespace corresponds to one of the built-in similarity calculation methods.

There is also a facility to use user-defined similarity measures when searching a fingerprint database. The following example shows how a similarity calculation can be implemented by deriving from the OESimFuncBase class.

Formula: \(Sim_{Simpson}(A,B) = \sqrt{\frac{bothAB}{min(onlyA+ bothAB),(onlyB+ bothAB))}}\)

    static class SimpsonSimFunc extends OESimFuncBase {

        @Override
        public float constCall(OEFingerPrint fpA, OEFingerPrint fpB) {

            OEUIntArray onlyA  = new OEUIntArray(1);
            OEUIntArray onlyB  = new OEUIntArray(1);
            OEUIntArray bothAB = new OEUIntArray(1);
            oechem.OEGetBitCounts(fpA, fpB, onlyA, onlyB, bothAB);

            float sim = (float)bothAB.getItem(0);
            sim /= (float)Math.min((onlyA.getItem(0) + bothAB.getItem(0)),
                    (onlyB.getItem(0) + bothAB.getItem(0)));
            return sim;
        }
        @Override
        public OESimFuncBase CreateCopy() {
            OESimFuncBase copy = new SimpsonSimFunc();
            copy.swigReleaseOwnership();
            return copy;
        }
        @Override
        public String GetSimTypeString() {
            return "Simpson";
        }
    }

After implementing the similarity calculation, it can be added to an OEFPDatabase object, henceforth this new similarity calculation will be used.

OEFPDatabase fpdb = new OEFPDatabase(OEFPType.Path);
fpdb.SetSimFunc(new SimpsonSimFunc());

See also