Fingerprint Types

A fingerprint is a bitvector. To reflect this the OEFingerPrint class derives from the OEBitVector class. The difference is that OEFingerPrint has a type that represents how the fingerprint is generated. Fingerprints may only be compared if they are generated in the same way. Therefore, the following restriction is introduced:


When two fingerprints are subjected to similarity calculation their type has to be identical.

Listing 1 shows how to create different fingerprint objects (OEFingerPrint) and identify or compare their types.

Listing 1: Fingerprint type

fpA = oegraphsim.OEFingerPrint()
fpB = oegraphsim.OEFingerPrint()
if not fpA.IsValid():
    print("uninitialized fingerprint")

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccccc1")

oegraphsim.OEMakeFP(fpA, mol, oegraphsim.OEFPType_Path)
oegraphsim.OEMakeFP(fpB, mol, oegraphsim.OEFPType_Lingo)

if oegraphsim.OEIsFPType(fpA, oegraphsim.OEFPType_Lingo):
if oegraphsim.OEIsFPType(fpA, oegraphsim.OEFPType_Path):

if oegraphsim.OEIsSameFPType(fpA, fpB):
    print("same fingerprint types")
    print("different fingerprint types")

The output of Listing 1 is the following:

uninitialized fingerprint
different fingerprint types

Two fingerprints are considered to be equivalent only if they have the same fingerprint type (OEFPTypeBase) and have identical bit-vectors (OEBitVector). The following code snippet shows how to compare two OEFingerPrint objects.

if fpA == fpB:
    print("same fingerprints")
    print("different fingerprints")

The following code snippet shows how to initialize a OEFingerPrint object by using the type of another fingerprint. The type of a fingerprint is accessed by the OEFingerPrint.GetFPTypeBase method.

fpA = oegraphsim.OEFingerPrint()
oegraphsim.OEMakePathFP(fpA, mol)

fpB = oegraphsim.OEFingerPrint()
oegraphsim.OEMakeFP(fpB, mol, fpA.GetFPTypeBase())

Fingerprint parameters

The User-defined Fingerprint chapter gives examples of how user defined fingerprints can be generated by defining, for example, the atom and bond properties that will be encoded into the fingerprints.

In order to ensure that only equivalent fingerprints can be compared, the fingerprint type stores the parameters being used in the generation process. The OEFPTypeBase.GetFPTypeString method returns the string representation of the fingerprint type that includes information about the parameters being used.

fp = oegraphsim.OEFingerPrint()
oegraphsim.OEMakeFP(fp, mol, oegraphsim.OEFPType_Path)

The output of the preceding snippet is the following:



The returned string does not include newline characters, the string was broken into two separate lines here only for better readability.

The following Listing 2 shows how to extract the parameters of a fingerprint from a string representation by using the OEFPTypeParams class.

Listing 2: Fingerprint parameters

fptype = oegraphsim.OEGetFPType(oegraphsim.OEFPType_Path)
prms = oegraphsim.OEFPTypeParams(fptype.GetFPTypeString())
print("version = %s" % oegraphsim.OEGetFingerPrintVersionString(prms.GetVersion()))
print("number of bits = %d" % prms.GetNumBits())
print("min bonds = %d" % prms.GetMinDistance())
print("max bonds = %d" % prms.GetMaxDistance())
print("atom types = %s" % oegraphsim.OEGetFPAtomType(prms.GetAtomTypes()))
print("bond types = %s" % oegraphsim.OEGetFPBondType(prms.GetBondTypes()))

The output of Listing 2 is the following:

version = 2.0.0
number of bits = 4096
min bonds = 0
max bonds = 5
atom types = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb|EqHalo
bond types = Order|Chiral

See also

Fingerprint version number

Each fingerprint type additionally has a version number. Version numbers are introduced in order to keep track of changes in the fingerprint generation algorithm itself. The OEFPTypeBase.GetFPVersionString method returns the string representation of the fingerprint version.

fp = oegraphsim.OEFingerPrint()
oegraphsim.OEMakeFP(fp, mol, oegraphsim.OEFPType_Path)

The output of the preceding snippet is the following:



The version number of the fingerprints will not be changed with each release. It will be incremented only if modifications or bug fixes to the corresponding algorithm would result in generating a different bit-vector for the same molecules.

Fingerprints with an old version number will be still readable and comparable with each other but not with fingerprints which have different version number.