There is a long history to fragment and bioisosteric replacement (see [Chen-2003]). Most medicinal chemists are well versed in standard sets of bioisosteric fragments. Likewise, there is a long history of computational approaches to fragment replacement (see [Verloop-1987] and [Bartlett-1994]). There have been several attempts to examine sets of known active compounds to empirically identify bioisosteric fragments (see [Ujvary-2003] and [Sheridan-2002]). While this is an interesting exercise, it has two drawbacks. First, it can only identify bioisosteric fragment pairs that are already known. While these provide interesting study, they are often already familiar to experienced medicinal chemists and modelers. Second, it identifies many incidental rather than meaningful fragment pairs. These result from the fact that simply because two molecules bind to the same site does not mean they differ only by bioisosteric replacement. For instance, chemists may analog a compound by substituting an N-methyl group with an N-benzyl group in order to identify new binding pockets. However, just because both of these compounds are bioactive does not mean that methyl and benzyl are similar fragments (though they would be identified as such by some methods). While one may apply various heuristics, such as size, to avoid this problem, we hope to explore methods that are more robust.
An alternative approach has been to use an algorithm that would predict whether two fragments are similar in relevant ways. Several groups including [Bartlett-1994], [Verloop-1987] and [Willett-2001] have developed methods in this area. Here we seek to capitalize on and extend the ideas developed by these workers.
BROOD allows users to enter a single query fragment and search a very large database of known molecular fragments in order to identify fragments that are similar. Each database fragment is compared to the query fragment in 3D with regard to shape, chemistry, electrostatics, and geometric presentation of attachment vectors. The fragments that are most similar to the query fragment will appear in a hitlist.
All similar fragments in a BROOD hitlist are organized into clusters. The clusters are organized so that molecules with the same ring structures and core framework (reduced-graph) are placed in the same cluster. The first cluster in BROOD hitlists is always the molecules that are similar to the query, sharing the same core atomic framework. While some of these analogs may be obvious, they often include alternative interesting chemistries. The remaining clusters are each organized around a unique core atomic framework. Each cluster is represented by its best scoring member and the clusters themselves are ranked by the score of their best member.
BROOD allows users to specify protein structures for the purpose of testing whether newly constructed analogs fit into the active site. When the BROOD query ligand is based upon a crystallographic co-crystal structure, BROOD builds the newly created analog molecules in the same shape and orientation as the query. If the crystallographic ligand was originally in an active site and the protein is passed to BROOD as the bump protein, the new analogs will be build in poses that are also in the bump protein’s active site. When a bump protein is passed, BROOD checks for clashes between the bump protein and each analog. By default, if any ligand heavy atom is less than 2.25 Angstroms from a protein heavy atom, the analog is removed from the hitlist.
Users can also specify another protein for selectivity testing (referred to here as the selectivity protein). In order for an analog to remain in the final hitlist, it must clash with the selectivity protein. The bump protein and the selectivity protein should be aligned and the BROOD query should be based on a molecule that fits within the bump protein active site. In this case, analogs in the final BROOD hitlist will also fit into the bump protein’s active site, while they will also have clashes with the selectivity protein. This combination is a simple model for analogs that have a chance to be active against the bump protein, but have a very low chance of being active against the selectivity protein. This model assumes the analog molecules bind to the bump protein in a manner that is similar to the original ligand and that the most favorable pose for the ligand class in the selectivity protein is similar to that in the bump protein.
BROOD’s output includes newly constructed analog molecules that are intended to have a similar 3D shape to the query molecule. These new analogs are constructed partially from the original molecule and partially from new fragments. When these new molecules are generated and built into a conformation that has good shape and chemistry overlap with the query molecules, some strain may be introduced. To produce high-quality results, it is essential that the analogs are optimized while maintaining the query shape and that little strain is introduced in the process.
The BROOD search process guarantees that each of the molecular fragments alone is in a low energy state. After the fragments are joined, this may no longer be true. For every BROOD analog in the final hitlist, two optimizations are carried out to determine the local strain introduced by maintaining a shape similar to the original query. In the first optimization, the ligand is allowed to relax into a local minimum. In the second optimization, the ligand atoms are only allowed to move a fraction of an Angstrom, keeping the same overall shape of the molecule. In both calculations, the OEMMFF [Halgren-1996-1], [Halgren-1996-2], [Halgren-1996-3], [Halgren-1996-4], [Halgren-1996-5], [Halgren-1999-1], [Halgren-1999-2] potential is used with a Sheffield solvation function [Grant-2007]. The local strain energy is the difference in ligand energy between the two calculations. By default, the maximum strain for any successful BROOD analog is limited to 6.5 kCal/M.
One approach to lead identification and development is based on the identification and expansion of physically very small molecule inhibitors that are commonly termed “fragments” (see [Hajduk-2007]). Fragments in this sense are molecules with few atoms and should not be confused with the term fragment used elsewhere in this document that refers to part of a molecule. Nevertheless, the fragment replacement algorithm in BROOD can be useful in the modeling of fragment-based design. In fragment-based design, one strategy is to combine two non-overlapping inhibitory fragments to form a single molecule. While this is sometimes empirically done with a series of flexible linear linkers, the linkers can also be modeled. In the BROOD GUI, it is possible to load two fragments and use the -linkOnly search to identify potential linkers that can join the fragments with an energetically favorable, medicinally relevant linker in a low-energy conformation. This search can be carried out in a protein’s active site, taking account of the need for the linker to fit into the active site as well as join the fragments. For more information on this application, please see the tutorials section.
One of the first applications of BROOD many users want to explore is the replacement of a flexible portion of a molecule with a more rigid fragment that fills the same space. BROOD excels at this application. This type of exercise can be considered a local cyclization. In some molecules, rather than a local cyclization, some chemists prefer to design a bridge between two portions of a molecule that do not have a local link. This is another task where BROOD can be quite useful. Long-distance cyclization, like fragment joining, is about finding a chemical fragment that can bridge to moieties given a particular 3D orientation. Use the -linkOnly option for long-distance cyclization.
In the early 1980’s, Bertz first published a measure of the complexity of molecules and asserted that his calculated complexity could be related to synthetic ease ([Bertz-1981], [Bertz-1982]). Bertz built complexity terms that are similar to a Shannon entropy ([Shannon-1949]) but with regard to the elements in a molecule and the diversity of small fragments that make up the structure of the molecule. While the actual synthetic accessibility can be heavily influenced by the availability of complex synthetic building blocks and advances in stereo synthetic methods, molecular complexity remains a useful tool for prioritizing which compounds chemists should look at first when primary modeling methods don’t readily distinguish between them.
In 2007, Boda and coworkers extended Bertz’s idea and compared it to experimental chemists’ predictions of synthetic accessibility ([Boda-2007]).
Boda made the significant advance of adding stereo complexity to Bertz’s elemental, graph, and ring complexity. We noted in the paper that nearly all the signal was generated by the molecular complexity and stereo complexity. In fact, each of these measures, without the linear fitting presented in the paper, correlated with chemists’ predictions of synthetic accessibility as well as the chemists’ predictions correlated with one another. Thus, in BROOD, we have implemented a molecular complexity that is a normalized sum of the graph and size complexity, elemental complexity, and stereo complexity. The normalized complexity score starts at zero for the simplest, smallest molecule and grows to values that generally don’t exceed 1.0 for medicinally relevant small molecules.
BROOD uses molecular complexity to sort analog molecules in the final hitlist that have very similar shape and color scores.
This is an example of applying BROOD as a technology to explore fragment similarity outside the direct context of a specific drug-discovery project. Tu and coworkers at Pfizer explored the chemical space of aromatic ring systems ([Tu-2012]). In their work, they used BROOD at the core of the NEAT (Novel and Electronically Equivalent Aromatic Templates) tool. This tool uses a combination of high-level QM-derived partial charges along with BROOD’s electrostatic similarity calculation to explore potentially aromatic system replacements. It allows medicinal chemists to explore the large space of complex aromatic ring systems for replacements that are both electronically and sterically analogous. We present this example as an illustration of the more diverse applications of BROOD’s fragment-matching technology.
It is a well-known concept in medicinal chemistry that compounds with greater similarity have a higher probability of having shared properties. It is by this premise that project teams seek to explore chemical space around a lead compound in order to discover new active molecules. Based on this concept, Muchmore and colleagues at Abbott attempted to quantify the relationship between various measures of ligand similarity and binding activity ([Muchmore-2008]). A large number of ligands with activities measured across a large number and wide variety of targets were used to generate the probability that two molecules with a given similarity would have activity within one log unit of one another (the p[active]). Several well-regarded ligand similarity techniques showed sigmoidal curves where, at very low similarity, the probability of shared activity was related to the prevalence of inhibitors in the underlying data; whereas, as two molecules approached very high similarity, the probability of similar activity approached something like 35-55%. While perhaps surprising at first, both of these results are sensible. It is well known that while similar molecules are much more likely to have shared activity than two random molecules, similarity is by no means a guarantee of shared activity.
As reported in their paper, one of the similarity techniques with the highest maximum probability was the combo score (shape + color score) from the OpenEye tool ROCS. The critical change from color score to color Tanimoto occurred in ROCS since the publication of Muchmore’s paper. The change to color Tanimoto resulted in an even higher maximum probability of 0.512 ([Brown-2008]) for shape + color Tanimoto. The curve that was refit using the Abbott data is used in BROOD to convert the overall shape + color Tanimoto similarity of the analog molecule to the query molecule to generate a p(Active) value that indicates the probability, according the Abbott’s belief model, that the analog compound will have activity within one log unit of the query compound.
In 2005, Martin published a predictive model for the probability that a compound will have bioavailability (f) > 10% in rats ([Martin-2005]). Despite attempts to identify a useful model using straightforward linear combinations of simple parameters, such as logP, logD, donors, acceptors, PSA, or flexibility, Martin noted that different predictive models were required for different ionization states. Anions require a bioavailability model that depends strongly on PSA. By contrast, neutral species, cations, and zwitterions require a model based on the rule-of-five. The combination of these models provides a single model for bioavailability in rats.
An essential feature of these methods is generation of a database of potential fragments. While it may be tempting to generate fragments de novo, these approaches often generate unrealistic chemical fragments. Particularly in regards to a method that is related to a common medicinal chemistry technique, we feel it is important to propose known fragments.
The initial database that comes with BROOD is derived from the ChEMBL 20 database. The compounds are fragmented resulting in approximately 11 million unique molecular fragments after standard property filtering. The fragments are prioritiezed according to their medicinal and geometric relevance to fragment replacement and a final collection of approximately six million fragments are retained for the database.
Users may also provide their own fragment database for searching. These fragments databases can be prepared from molecule collections using the CHOMP program. CHOMP breaks the molecules into fragments, filters the fragments, enumerates undefined stereochemistry, tracks molecules from which the fragments came, and identifies the unique collection of fragments. Once this set is generated, CHOMP generates or reads multiconformer representations for each fragment. The conformers can either be generated by OMEGA technology within CHOMP or extracted from small-molecule crystal structure databases passed into CHOMP. As a final step in database generation, CHOMP precalculates physical and geometric properties, organizes the fragments for efficient retrieval, and writes a database format that is optimized for efficient BROOD searching.