Molecule Search Page

The Molecule Search page provides the ability to search using similarity in two or three dimensions; with substructures; using matches with search terms; and by molecule title.

Molecule Search supersedes functionality that was previously provided by the OpenEye MaaS and FastROCS web services. Users have the option to send a molecule directly from the spreadsheet on the 3D or Analyze pages in the Orion user interface to the Molecule Search page. In addition, results can be stored in files or datasets for use in other Orion operations.

_images/mol_search_overview2.png

Figure 1. The Molecule Search page.

For a basic demonstration of the Molecule Search capability, please see the following video.

How to Add a Query Molecule

There are several ways to add a query for search. You can sketch a molecule; paste a query using a SMILES, MOL, or SDF file; or search by title.

Orion users can also send a molecule from the Analyze page or the 3D page by selecting the molecule, right-clicking, and choosing “Send to Molecule Search - Beta.”

Add a Query Using the Sketcher

The Sketcher allows you to sketch a molecule using the tools in the left-hand toolbar. The primary tools are defined in Table 1. The Sketcher is available on the Molecule Search page, on the Data page (choose Sketch under the “Add Data” button), and as an input option for certain floes (such as the Generative Structure - Site Selection Floe).

Table 1. Primary tools shown in the Sketcher.

Symbol

Tool

Application

bonds

Bonds

Bond options

rings

Rings

Ring options

templates

Templates

Functional groups, protecting groups, rings, amino acids, chains, or
user-defined templates

atoms

Atoms

Common atoms, generic R groups, and less common elements (via periodic table)

chain

Chain

Create a chain of desired length

delete

Delete

Delete part of a structure

clean_up

Clean up molecule geometry

Show the structure with proper geometry

clear

Clear structure

Delete entire structure

pan

Pan

Move structure within Sketcher window

The Sketcher tools have several levels. Most are accessed directly from the toolbar or from drop-down menus under certain building tools. In the toolbar for these tools, click on the caret to reveal the options. You can hover over these additional choices for identification of the tool. More tools can be accessed with a right-click on an atom in the Sketcher window.

The Bonds tool allows you to choose the type of bond. The wavy bond indicates undefined R/S stereochemistry, and the double either bond indicates unspecified E/Z double bond stereochemistry.

Under the Templates tool, you can select from a variety of predefined functional groups, protecting groups, rings, amino acids, and chains. User-defined templates can also be constructed. To do this, build the desired molecular structure in the 2D Sketcher and select the plus sign under ‘My Templates.’ Name the template in the pop-up screen. It will now appear as a choice in ‘My Templates.’ To delete a template, click the red X in the upper corner of its selection box.

The Atoms tool allows you to add common atoms, generic R groups, and less common elements (via a periodic table).

When sketching a molecule, an atom center will turn red if it has more than the allowed number of bonds. A pink dot on an atom indicates that it is a stereocenter.

The entire molecule or part of a molecule may be copied for use outside of Orion. For a portion of the molecule, first highlight the desired part. The selection can be copied as a SMILES or MOL file by right-clicking within the selection box (do not directly click an atom), clicking the “Copy” box, and selecting SMILES or MOL. Alternatively, to copy an entire molecule, right-click anywhere in the Sketcher window or simply on a bond, then click the “Copy” box and select MOL or SMILES. The SMILES or MOL file for the selected portion will be in your computer’s clipboard, ready to be pasted and used elsewhere. Right-clicking on an atom in a structure gives two options: elements and R-groups.

The following video shows some useful Sketcher shortcuts.

Search Context

Molecule Search includes aspects of fingerprints, similarity measurements, and the OpenEye concepts of shape and color. While these may be quite familiar, a brief introduction is provided for those who are new to these ideas.

Fingerprints

Molecular fingerprints are cheminformatics tools for virtual screening that are frequently used for similarity comparisons. Fingerprints capture and encode structural features, representing the presence or absence of various substructures or molecular characteristics. For more details about OpenEye fingerprints, see the theory chapter of GraphSim TK.

Note

Performance and accuracy for fingerprints of macrocycles has not been performed and is not optimized.

Molecule Search uses three different methods to define fingerprints: circular, path, and tree. Figures 4, 5, and 6 show how these fingerprints are defined.

Circular fingerprints enumerate the circular fragments of each heavy atom of the molecule up to a given radius. They find substructures of molecules by determining the connectivity of atoms.

_images/mol_search_circularFP.png

Figure 4. How to construct circular fingerprints. All “1” values correspond to fragments that appear in a particular ring. The “0” fragments are not present.

Path fingerprints enumerate all linear fragments of the molecule.

_images/mol_search_pathFP.png

Figure 5. How to construct path fingerprints.

Lastly, tree fingerprints enumerate branching fragments of a molecule and encode molecular elements not captured by the other methods.

_images/mol_search_treeFP.png

Figure 6. How to construct tree fingerprints.

Path and tree fingerprints can miss the connectivity of the atoms, but path fingerprints provide a large number of more diverse results. We recommend tree or circular fingerprints for highly branched molecules.

Similarity

Similarity is defined by the comparison of specific molecular features (fingerprints). Molecules can be aligned in 2D or 3D to assess similarities in spatial conformations. Similarity also examines the relationship between a query’s molecular structure and biological behavior for the prediction of hits. A discussion of similarity comparison based on fingerprints can be found in the Python Cookbook and visualized in GraphSim TK.

OpenEye uses several methods for similarity measurements. The Tanimoto coefficient addresses the question of how similar the query and database molecules are to each other. It measures the overlap of structural features.

The Tversky measurement is similar, but it quantifies similarity based on shared substructures and molecular features. The reference Tversky asks how similar the database molecules are to the query molecule. The Tversky coefficient is used for ROCS measurements.

Shape and Color

OpenEye, Cadence Molecular Sciences uses shape and color to compare the volume and chemical features of the query molecule to the database molecules for 3D similarity.

Shape refers to OpenEye’s definition of volume as a scalar field, where volume is considered to have different values at different points in space. Shape atoms define similarity by the physical position and relative reference of the atoms to one another. Thus, the more the volumes of the query molecule and the database molecule correspond, the more similar their shapes will be. For example, a shape-exclusive search on the six carbon atoms in a benzene ring shows high similarity to a hexane ring due to similar relative positions of the atoms. Atom groups with similar hybridization and 3D positioning also have high similarity.

Color atoms define another aspect of molecular 3D similarity: chemical features. The following chemical features are currently supported in Molecule Search: donor, acceptor, anion, cation, hydrophobe, and rings. These features refine the search to find database molecules with similar chemical features at the defined position of the shape atom(s). Because chemical features are relative to volume, color atoms are relative to the shape atoms.

If a search is conducted with the “Color Atom” toggle turned Off, 3D search will use the default color atoms for the query molecule. When the “Color Atom” toggle is turned On, the Sketcher automatically defines the color atoms for the query molecule, but the color atoms can be changed to explore different chemical features of the database molecules. Changing the color atoms only affects chemical features, not the volume defined by the shape atoms. For example, methane (CH4) is a tetrahedral, neutral molecule. Ammonium (NH4+) is a tetrahedral ion. If a cation color atom is placed on a query carbon atom of methane, then ammonium will be similar to the query based on shape and color.

Note

Custom color force fields are not supported within Molecule Search.

Search Results

A search brings you to a separate results page. On the left, there is a panel showing the query molecule and details about the search. On the right, the structures of the hits found in the search are depicted, each in their own tile box.

Note

A search that is designated as a success may return no results. This may indicate that no molecules similar to the query were found with the designated parameters in that particular database. Consider changing the parameters or using another database for your search.

2D Similarity Search Results

The found structures are listed by their Tanimoto score. Atoms in the 2D and 3D depictions of search results follow this coloring scheme to ease quick comparisons between the results.

_images/mol_search_2D-2.png

Figure 7. Results for a 2D similarity search.

3D Similarity Search Results

For a 3D similarity search, the search results list 2D scores and rank the 3D Tanimoto score by either combo, shape, or color. These options offer different methods of comparing the database molecules to the query molecule. The 3D results also show a graph of the 3D combo versus the 2D score. Additionally, the view can be changed to either 3D, 2D similarity, shape, or color visual depictions.

Note

For performance reasons, searches of more than 500 hits will not return Grapheme depictions that show the 2D similarity, shape, and color results. Only 2D and 3D depictions will appear.

Figures 8 through 13 are from the same 3D similarity search of ibuprofen.

The 2D depiction simply shows the 2D representation of the molecule.

_images/mol_search_3D_2Dview-2.png

Figure 8. 2D depictions of the results from a 3D similarity search of ibuprofen.

The 3D depiction shows an overlay of the query molecule in grey with the database molecule in green. The overlaid structures can be manipulated with the mouse for better viewing. Molecules in individual result windows can be rescaled using the scroll wheel of your mouse. The orientation of the query molecule on the left may not correlate to the orientation of the database molecules in the results section.

_images/mol_search_3D-2.png

Figure 9. 3D depictions from the 3D similarity search of ibuprofen.

When a 3D search is performed using the “Color Atoms” feature, there is an additional viewing option. When the “Show Color Atoms” toggle in the 3D display in On, the structures are displayed to show the color atoms from the search and a label for the corresponding chemical feature.

_images/mol_search_3D_color_atoms_search.png

Figure 10. Depictions from a 3D similarity search of ibuprofen using color atoms. The structures shown are the same as the first two molecules shown in the other 3D search results.

The similarity depiction shows the similarity of the database molecule to the query molecule with a color gradient from yellow to dark green, with dark green indicating greater 2D similarity. Pink sections of the database molecule are not similar to the query molecule.

_images/mol_search_3D_2Dsim.png

Figure 11. The 2D similarity depiction of ibuprofen and the database molecules.

The shape depiction uses volume depiction of the query molecule overlaid with the 2D structure of the database molecule. The grey dotted outline indicates the outer edge of the query molecule’s shape. A purple gradient indicates the database molecule’s shape alignment with the query molecule, with darker regions indicating higher shape overlap between the molecules.

_images/mol_search_3D_shape.png

Figure 12. The shape depiction of the volume of ibuprofen overlaid with the 2D structures of the database molecules.

Finally, the color depiction displays the color atoms in a similar fashion to the 3D “Color Atoms” toggle in the Sketcher. The query molecule’s color atoms are overlaid with the database molecule’s color atoms in 2D. Similar and overlapping color atoms appear as solid circles. A shading gradient indicates the database molecule’s color alignment with the query molecule, with darker regions indicating higher color overlap between the molecules. Color atoms without overlap and similarity appear as open circles.

_images/mol_search_3D_color.png

Figure 13. The color depiction of ibuprofen’s color atoms overlaid with the color atoms of the database molecules. Zoom into the figure and look at the top red color atom. For the structure on the left, that atom is quite dark, indicating good color atom alignment. The other structures show a lighter color for that atom, meaning that the color atoms are not aligned as well.

Note

No similarity scores are provided for substructure or exact searches, but only the titles of the found structures.

Substructure Search Results

Common substructures found by a substructure search are highlighted in blue. Atoms and bonds that are not highlighted in blue are unique to the database molecule. Any atom or bond deviation, depending on the parameters set for the search, will not be highlighted in blue.

Note

With small databases such as the one in these examples, we can often trust that ibuprofen will be at the top of the hit list for this query. But the order of results may not be consistent with larger databases and more promiscuous queries.

By default, atom stereo information is not considered. Figure 14 shows this result, which has 42 matches.

_images/mol_search_substructure-3.png

Figure 14. Results of a basic substructure search.

The MatchAtomStereo option forces the query molecules to match the specified atom stereo configuration, and the hits decrease to five, as shown in Figure 15.

_images/mol_search_substructure-4.png

Figure 15. Results of a substructure search with the Atom Stereo parameter chosen.

The AddBondAliphaticConstraint only allows aliphatic query bonds to be mapped to aliphatic bonds in the database molecules. Adding this constraint drops the hits to three. For example, naproxen in Figure 15 does not match the parameter requirements. The aliphatic methyl in the query does not match the aromatic bond in the napthalene of naproxen.

_images/mol_search_substructure-5.png

Figure 16. Results of a substructure search with the Atom Stereo and Aliphatic Constraint parameters chosen.

Adding the third parameter, AddBondTopologyConstraint, requires that chain bonds be mapped only to chain bonds. This reduces the hits to two, as nogalamycin maps to a ring bond.

_images/mol_search_substructure-6.png

Figure 17. Results of a substructure search with the Atom Stereo, Aliphatic Constraint, and Topology Constraint parameters chosen.

The combination of the MatchAtomStereo and AddBondTopologyConstraint options yields the same result shown in Figure 17, as the disallowed aliphatic/aromatic matches are also chain/ring matches.

_images/mol_search_substructure-7.png

Figure 18. Results of a substructure search with the Atom Stereo and Topology Constraint parameters chosen.

As one would expect, the AddBondAliphaticConstraint and the AddBondTopologyConstraint parameters either on their own or combined are not nearly as restrictive as searches that include MatchAtomStereo to restrict the atom stereo configuration.

Exact Match Search Results

Again, an exact search seeks to restrict or maintain the stereochemistry, bonding, or hybridization of a structure. An exact search may yield no results if the query molecule is not present in the database, or it may yield only the query molecule. Some searches do yield several hits, depending on the database and parameter choices. Exact searches can be very sensitive to the parameters chosen for the search.

_images/mol_search_exact.png

Figure 19. Results of an exact search using the Exact Match parameter.

How to Save or Pin Results

You can save the search results to a dataset or file, or you can send the results directly to the Analyze page. When saving results to a file, the output can be saved to a .csv, .oeb, or .sdf file; in addition, all search types except for 3D can be saved to a .smi file.

_images/mol_search_save_query-2.png

Figure 21. Retaining the results of a search.

Search History

On the left side for each individual query in the Search History, an editable line is available to name the query after the search has been completed. On the right side, four options are available: Load Query, Load Results, Pin Query, and Delete. These options allow you to take the corresponding action regarding the query. The “Load Results” button allows you to access the results for that query.

_images/mol_search_history_pin.png

Figure 22. The Search History tab.

Note

Orion retains the history of your searches automatically. They are cleared after 30 days unless you pin them.