Storage and Retrieval¶
The OEFingerPrint does not store any reference to
the molecule from which it was generated.
The user has to keep track of which fingerprint corresponds to which
molecule.
One way to do this is to attach the fingerprint, as generic data, to
the molecule.
Listing 3
shows how store and retrieve fingerprints
as generic data.
Listing 3: Storing and retrieving fingerprint as generic data
public class FPData {
public static void main(String argv[]) {
String tag = "FP_DATA";
OEGraphMol mol = new OEGraphMol();
oechem.OESmilesToMol(mol, "c1ccccc1");
OEFingerPrint fp = new OEFingerPrint();
oegraphsim.OEMakeLingoFP(fp, mol);
oegraphsim.OESetFP(mol, tag, fp);
if (mol.HasData(tag)) {
OEFingerPrint f = oegraphsim.OEGetFP(mol, tag);
if (f.IsValid()) {
String fptype = f.GetFPTypeBase().GetFPTypeString();
System.out.format("%s fingerprint with `%s` identifier\n", fptype, tag);
}
}
}
}
It is good practice to check the validity of the fingerprint after
retrieving it.
The OEFingerPrint.IsValid
method returns true if the fingerprint was successfully initialized.
Listing 4
demonstrates how to create an
OEB
binary file that stores molecules along with their fingerprints.
When reading the OEB
file that was generated by this program, the
pre-calculated fingerprints can be accessed rapidly with the PATH_FP
tag.
This eliminates the on-the-fly generation of the fingerprints.
See also
Additional examples in Listing 10
and
Listing 12
of the Fingerprint Database
chapter.
Listing 4: Fingerprint generation and storage in OEB
public class FP2OEB {
public static void main(String argv[]) {
if (argv.length != 2)
oechem.OEThrow.Usage("FP2OEB <infile> <outfile>");
oemolistream ifs = new oemolistream();
if (!ifs.open(argv[0]))
oechem.OEThrow.Fatal("Unable to open " + argv[0] + " for reading");
oemolostream ofs = new oemolostream();
if (!ofs.open(argv[1]))
oechem.OEThrow.Fatal("Unable to open " + argv[1] + " for writing");
if (ofs.GetFormat() != OEFormat.OEB)
oechem.OEThrow.Fatal(argv[1] + " output file has to be an OEBinary file");
OEFingerPrint fp = new OEFingerPrint();
OEGraphMol mol = new OEGraphMol();
while (oechem.OEReadMolecule(ifs, mol)) {
oegraphsim.OEMakeFP(fp, mol, OEFPType.Path);
oegraphsim.OESetFP(mol, "PATH_FP", fp);
oechem.OEWriteMolecule(ofs, mol);
}
ifs.close();
ofs.close();
}
}
The following code snippets shows how to generate a bitstring from an OEFingerPrint object.
Listing 5: Accessing a fingerprint as a bitstring
static String GetBitString(OEFingerPrint fp) {
StringBuffer bitstring = new StringBuffer(fp.GetSize());
for (int b = 0; b < fp.GetSize(); b++) {
if (fp.IsBitOn(b))
bitstring.append('1');
else
bitstring.append('0');
}
return bitstring.toString();
}
See also
OEBitVector class
Warning
Even though the GraphSim TK library provides a fingerprint API for the LINGO similarity search method, it is not implemented as a real fingerprint, so bitstrings that are generated from LINGO fingerprints are meaningless.
Fingerprints can also be stored in SDF
files.
The Listing 6
demonstrates how to create an
SDF
and store fingerprints as hexadecimal strings.
After the fingerprint is generated, it is attached to the molecule
as an SD data tag. The identifier of the fingerprint in the SDF
file will be the string representation of the fingerprint type.
The bitvector of the fingerprint is converted to a hexadecimal
string with the OEBitVector.ToHexString
method.
Listing 6: Storing fingerprint in SDF
public class FP2SDF {
public static void main(String argv[]) {
if (argv.length != 2)
oechem.OEThrow.Usage("FP2SDF <infile> <outfile>");
oemolistream ifs = new oemolistream();
if (!ifs.open(argv[0]))
oechem.OEThrow.Fatal("Unable to open " + argv[0] + " for reading");
oemolostream ofs = new oemolostream();
if (!ofs.open(argv[1]))
oechem.OEThrow.Fatal("Unable to open " + argv[1] + " for writing");
if (ofs.GetFormat() != OEFormat.SDF)
oechem.OEThrow.Fatal(argv[1] + " output file has to be an SDF file");
OEFingerPrint fp = new OEFingerPrint();
OEGraphMol mol = new OEGraphMol();
while (oechem.OEReadMolecule(ifs, mol)) {
oegraphsim.OEMakeFP(fp, mol, OEFPType.Circular);
String fptypestr = fp.GetFPTypeBase().GetFPTypeString();
String fphexdata = fp.ToHexString();
oechem.OESetSDData(mol, fptypestr, fphexdata);
oechem.OEWriteMolecule(ofs, mol);
}
ifs.close();
ofs.close();
}
}
The following example (Listing 7
) shows how to
retrieve fingerprints from an SDF
file.
When looping over the SD data the OEIsValidFPTypeString
functions can be used to identify SD data that stores
a fingerprint.
The tag of the data is the string representation of the fingerprint type.
This string representation can be used to generate the
corresponding OEFPTypeBase object by using
the OEGetFPType
function.
The type of the OEFingerPrint object then has to
be set by the OEFingerPrint.SetFPTypeBase
method.
Finally, the bitvector of the fingerprint then can be initialized from the
hexadecimal string by using the
OEBitVector.FromHexString
method.
Listing 7: Retrieving fingerprint from SDF
public class SDF2FP {
public static void main(String argv[]) {
if (argv.length != 1)
oechem.OEThrow.Usage("SDF2FP <infile>");
oemolistream ifs = new oemolistream();
if (!ifs.open(argv[0]))
oechem.OEThrow.Fatal("Unable to open " + argv[0] + " for reading");
if (ifs.GetFormat() != OEFormat.SDF)
oechem.OEThrow.Fatal(argv[0] + " input file has to be an SDF file");
int molcounter = 0;
int fpcounter = 0;
OEGraphMol mol = new OEGraphMol();
while (oechem.OEReadMolecule(ifs, mol)) {
molcounter += 1;
for (OESDDataPair dp : oechem.OEGetSDDataPairs(mol)) {
if (oegraphsim.OEIsValidFPTypeString(dp.GetTag())) {
fpcounter += 1;
String fptypestr = dp.GetTag();
String fphexdata = dp.GetValue();
OEFingerPrint fp = new OEFingerPrint();
OEFPTypeBase fptype = oegraphsim.OEGetFPType(fptypestr);
fp.SetFPTypeBase(fptype);
fp.FromHexString(fphexdata);
}
}
}
ifs.close();
System.out.println("Number of molecules = " + molcounter);
System.out.println("Number of fingerprints = " + fpcounter);
}
}
See also
OEFPTypeBase.GetFPTypeString
methodOEBitVector class
SD Tagged Data Manipulation chapter in the OEChem TK manual