OESubSearchQuery

Attention

PRELIMINARY-IMAGE This is a preliminary API until Fall 2020 and may be improved based on user feedback. It is currently available in C++ and Python.

class OESubSearchQuery

The OESubSearchQuery class is used to submit queries to be searched in a database (OESubSearchDatabase).

Constructors

OESubSearchQuery(const OEQMolBase &query, const size_t maxmatches=1000u)

Creates an OESubSearchQuery object.

query
The query molecule (OEQMolBase).
maxmatches
The maximum number of matches that will be kept.

SetFilter

void SetFilter(const OESystem::OEUnaryPredicate<OEMolBase>&)

Sets a molecule predicate that can be used to filter out molecules based on molecular properties other than the existence of a certain substructure.

The following code snippet shows how to use the OESubSearchQuery::SetFilter method to identify molecules that are matching the given SMARTS pattern and also have molecule weight in the given range:

OESubSearchDatabase ssdb(dbfname, OESubSearchDatabaseType::Default, nrthreads);

OEQMol qmol;
OEParseSmarts(qmol, "c1c[n,o]cc1");

const unsigned maxmatches = 100u;
OESubSearchQuery query(qmol, maxmatches);

OESubSearchResult result;
ssdb.Search(result, query);
cout << "Number of total matches = " << result.NumTotalMatches() << endl;

// search filtered by molecule weight

const double minweight = 200.0;
const double maxweight = 350.0;
query.SetFilter(MoleculeWeightPredicate(minweight, maxweight));

OESubSearchResult filteredresult;
ssdb.Search(filteredresult, query);
cout << "Number of total matches (filtered) = " << filteredresult.NumTotalMatches() << endl;

OEGraphMol mol;
for (OEIter<size_t> index = filteredresult.GetMatchIndices(); index; ++index)
{
  if (ssdb.GetMolecule(mol, *index))
    cout << "weight= " << OECalculateMolecularWeight(mol) << " " << OEMolToSmiles(mol) << endl;
}

The output of the code snippet above might look like this:

Number of total matches = 20
Number of total matches (filtered) = 7
weight= 204.225 c1ccc2c(c1)c(c[nH]2)C[C@H](C(=O)O)N
weight= 218.252 CN[C@@H](Cc1c[nH]c2c1cccc2)C(=O)O
weight= 245.277 CC(=O)N[C@@H](Cc1c[nH]c2c1cccc2)C(=O)N
weight= 275.303 c1ccc2c(c1)c(c[nH]2)C[C@@H](C(=O)O)NC(=O)CCN
weight= 260.288 CC(=O)N[C@@H](Cc1c[nH]c2c1cccc2)C(=O)OC
weight= 274.315 CCOC(=O)[C@H](Cc1c[nH]c2c1cccc2)NC(=O)C
weight= 254.327 CN1CC(C=C2[C@H]1Cc3c[nH]c4c3c2ccc4)CO

where MoleculeWeightPredicate is a molecule predicate that is defined as:

class MoleculeWeightPredicate : public OEUnaryPredicate<OEMolBase>
{
public:
  MoleculeWeightPredicate(const double minweight, const double maxweight)
    : OEUnaryPredicate<OEMolBase>(),
      m_minweight(minweight),  m_maxweight(maxweight)
  { }
  MoleculeWeightPredicate(const MoleculeWeightPredicate&) = delete;
  MoleculeWeightPredicate& operator=(const MoleculeWeightPredicate&) = delete;
  ~MoleculeWeightPredicate() = default;
  bool operator()(const OEMolBase& mol) const
  {
    const auto weight = OECalculateMolecularWeight(mol);
    return (weight >= m_minweight && weight <= m_maxweight);
  }
  OEUnaryFunction<OEMolBase,bool> *CreateCopy() const
  {
    return new MoleculeWeightPredicate(m_minweight, m_maxweight);
  }
private:
  const double m_minweight;
  const double m_maxweight;
};

Note

During the search, the predicate set by the OESubSearchQuery::SetFilter method is utilized after the screening phase and before the atom-by-atom validation of the substructure search match. Note that to minimize the memory footprints, the OESubSearchDatabase only stores the molecular graphs (no coordinates) and the titles of the molecules.

See also

GetMaxMatches

size_t GetMaxMatches() const

Returns the maximum match limit when searching OESubSearchDatabase. The OESubSearchDatabase::GetMatchIndices and OESubSearchDatabase::GetMatchTitles methods will terminate when this limit is reached. The default is 1000.

SetMaxMatches

void SetMaxMatches(const size_t limit)

Sets the maximum match limit.

Note

While there is no upper limit on how many matches can be retrieve by the search, it is not recommended to set this limit very high (>10K). Searching a very large database with a very generic query can result in internally storing millions of indices or titles. The total number of matches can be determined (without storing all matches) either using the OESubSearchDatabase::NumMatches method or via the OESubSearchResult::NumTotalMatches counter when using the OESubSearchDatabase::Search method.