Storage and RetrievalΒΆ

The OEFingerPrint does not store any reference to the molecule from which it was generated. The user has to keep track of which fingerprint corresponds to which molecule. One way to do this is to attach the fingerprint, as generic data, to the molecule. Listing 3 shows how store and retrieve fingerprints as generic data.

Listing 3: Storing and retrieving fingerprint as generic data

#include <openeye.h>
#include <oechem.h>
#include <oegraphsim.h>

using namespace OEChem;
using namespace OEGraphSim;

int main(int, char* [])
{
  const char* tag = "FP_DATA";
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccccc1");

  OEFingerPrint fp;
  OEMakeLingoFP(fp, mol);
  mol.SetData<OEFingerPrint>(tag, fp);

  if (mol.HasData(tag))
  {
    OEFingerPrint f = mol.GetData<OEFingerPrint>(tag);
    if ( f )
    {
      std::string fptype = f.GetFPTypeBase()->GetFPTypeString();
      std::cout << fptype << " fingerprint with `" << tag 
                <<  "` identifier" << std::endl;
    }
  }

  return 0;
}

It is good practice to check the validity of the fingerprint after retrieving it. The OEFingerPrint::operator bool method returns true if the fingerprint was successfully initialized.

Listing 4 demonstrates how to create an OEB binary file that stores molecules along with their fingerprints. When reading the OEB file that was generated by this program, the pre-calculated fingerprints can be accessed rapidly with the PATH_FP tag. This eliminates the on-the-fly generation of the fingerprints.

See also

Additional examples in Listing 10 and Listing 12 of the Fingerprint Database chapter.

Listing 4: Fingerprint generation and storage in OEB

#include <openeye.h>
#include <oechem.h>
#include <oegraphsim.h>

using namespace OESystem;
using namespace OEChem;
using namespace OEGraphSim;

int main(int argc, char* argv[])
{
  if (argc != 3)
    OEThrow.Usage("%s <infile> <outfile>", argv[0]);

  oemolistream ifs;
  if (!ifs.open(argv[1]))
    OEThrow.Fatal("Unable to open %s for reading", argv[1]);

  oemolostream ofs;
  if (!ofs.open(argv[2]))
    OEThrow.Fatal("Unable to open %s for writing", argv[2]);
  if (ofs.GetFormat() != OEFormat::OEB)
    OEThrow.Fatal("%s output file has to be an OEBinary file", argv[2]);

  OEFingerPrint fp;
  OEGraphMol mol;
  while (OEReadMolecule(ifs, mol))
  {
    OEMakeFP(fp, mol, OEFPType::Path);
    mol.SetData<OEFingerPrint>("PATH_FP", fp);
    OEWriteMolecule(ofs, mol);
  }
  return 0;
}

The following code snippets shows how to generate a bitstring from an OEFingerPrint object.

Listing 5: Accessing a fingerprint as a bitstring

string GetBitString(const OEFingerPrint& fp)
{
  string bitstring;
  bitstring.resize(fp.GetSize(),'0');
  for(unsigned int b = 0; b < fp.GetSize(); ++b)
  {
    if (fp.IsBitOn(b))
      bitstring[b] = '1';
  }
  return bitstring;
}

See also

Warning

Even though the GraphSim TK library provides a fingerprint API for the LINGO similarity search method, it is not implemented as a real fingerprint, so bitstrings that are generated from LINGO fingerprints are meaningless.

Fingerprints can also be stored in SDF files. The Listing 6 demonstrates how to create an SDF and store fingerprints as hexadecimal strings.

After the fingerprint is generated, it is attached to the molecule as an SD data tag. The identifier of the fingerprint in the SDF file will be the string representation of the fingerprint type. The bitvector of the fingerprint is converted to a hexadecimal string with the OEBitVector::ToHexString method.

Listing 6: Storing fingerprint in SDF

#include <openeye.h>
#include <oechem.h>
#include <oegraphsim.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;
using namespace OEGraphSim;

int main(int argc, char* argv[])
{
  if (argc != 3)
    OEThrow.Usage("%s <infile> <outfile>", argv[0]);

  oemolistream ifs;
  if (!ifs.open(argv[1]))
    OEThrow.Fatal("Unable to open %s for reading", argv[1]);

  oemolostream ofs;
  if (!ofs.open(argv[2]))
    OEThrow.Fatal("Unable to open %s for writing", argv[2]);
  if (ofs.GetFormat() != OEFormat::SDF)
    OEThrow.Fatal("%s output file has to be an SDF file", argv[2]);

  OEFingerPrint fp;
  OEGraphMol mol;
  while (OEReadMolecule(ifs, mol))
  {
    OEMakeFP(fp, mol, OEFPType::Circular);
    string fptypestr = fp.GetFPTypeBase()->GetFPTypeString();
    string fphexdata;
    fp.ToHexString(fphexdata);
    OESetSDData(mol, fptypestr, fphexdata);
    OEWriteMolecule(ofs, mol);
  }
  return 0;
}

The following example (Listing 7) shows how to retrieve fingerprints from an SDF file. When looping over the SD data the OEIsValidFPTypeString functions can be used to identify SD data that stores a fingerprint. The tag of the data is the string representation of the fingerprint type. This string representation can be used to generate the corresponding OEFPTypeBase object by using the OEGetFPType function. The type of the OEFingerPrint object then has to be set by the OEFingerPrint::SetFPTypeBase method. Finally, the bitvector of the fingerprint then can be initialized from the hexadecimal string by using the OEBitVector::FromHexString method.

Listing 7: Retrieving fingerprint from SDF

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
#include <oegraphsim.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;
using namespace OEGraphSim;

int main(int argc, char* argv[])
{
  if (argc != 2)
    OEThrow.Usage("%s <infile>", argv[0]);

  oemolistream ifs;
  if (!ifs.open(argv[1]))
    OEThrow.Fatal("Unable to open %s for reading", argv[1]);
  if (ifs.GetFormat() != OEFormat::SDF)
    OEThrow.Fatal("%s input file has to be an SDF file", argv[1]);

  unsigned int molcounter = 0;
  unsigned int fpcounter  = 0;
  OEGraphMol mol;
  while (OEReadMolecule(ifs, mol))
  {
    molcounter += 1;
    for (OEIter<OESDDataPair> dp = OEGetSDDataPairs(mol); dp; ++dp)
    {
      if (OEIsValidFPTypeString(dp->GetTag()))
      {
        fpcounter += 1;
        string fptypestr = dp->GetTag();
        string fphexdata = dp->GetValue();
        OEFingerPrint fp;
        const OEFPTypeBase* fptype = OEGetFPType(fptypestr);
        fp.SetFPTypeBase(fptype);
        fp.FromHexString(fphexdata);
      }
    }
  }

  cout << "Number of molecules = " << molcounter << endl;
  cout << "Number of fingerprints = " << fpcounter << endl;

  return 0;
}

See also