Molecule Search Page
The Molecule Search page provides the ability to search using similarity in two or three dimensions; with substructures; using matches with search terms; and by molecule title.
Molecule Search supersedes functionality that was previously provided by the OpenEye MaaS and FastROCS web services. Users have the option to send a molecule directly from the spreadsheet on the 3D or Analyze pages in the Orion user interface to the Molecule Search page. In addition, results can be stored in files or datasets for use in other Orion operations.
For a basic demonstration of the Molecule Search capability, please see the following video.
There are several ways to add a query for search. You can sketch a molecule; paste a query using a SMILES, MOL, or SDF file; or search by title. Orion users can also send a molecule from the Analyze page or the 3D page by selecting the molecule, right-clicking, and choosing “Send to Molecule Search - Beta.” The Sketcher allows you to sketch a molecule using the tools in the left-hand toolbar. The primary tools are defined in
Table 1. The Sketcher is available on the Molecule Search page, on the Data page (choose Sketch under the “Add Data” button),
and as an input option for certain floes (such as the Generative Structure - Site Selection Floe). Table 1. Primary tools shown in the Sketcher. Symbol Tool Application Bonds Bond options Rings Ring options Templates Atoms Common atoms, generic R groups, and less common elements (via periodic table) Chain Create a chain of desired length Delete Delete part of a structure Clean up molecule geometry Show the structure with proper geometry Clear structure Delete entire structure Pan Move structure within Sketcher window The Sketcher tools have several levels. Most are accessed directly from the toolbar or from drop-down menus under certain
building tools. In the toolbar for these tools, click on the caret to reveal the options. You can hover over these
additional choices for identification of the tool. More tools can be accessed with a right-click on an atom in the Sketcher window. The Bonds tool allows you to choose the type of bond. The wavy bond indicates undefined R/S stereochemistry, and the
double either bond indicates unspecified E/Z double bond stereochemistry. Under the Templates tool, you can select from a variety of predefined functional groups, protecting groups, rings, amino
acids, and chains. User-defined templates can also be constructed. To do this, build the desired molecular structure in
the 2D Sketcher and select the plus sign under ‘My Templates.’ Name the template in the pop-up screen. It will now appear
as a choice in ‘My Templates.’ To delete a template, click the red X in the upper corner of its selection box. The Atoms tool allows you to add common atoms, generic R groups, and less common elements (via a periodic table). When sketching a molecule, an atom center will turn red if it has more than the allowed number of bonds. A pink dot on an
atom indicates that it is a stereocenter. The entire molecule or part of a molecule may be copied for use outside of Orion. For a portion of the molecule, first
highlight the desired part. The selection can be copied as a SMILES
or MOL file by right-clicking within the selection box (do not directly click an atom), clicking the “Copy” box, and selecting
SMILES or MOL. Alternatively, to copy an entire molecule, right-click anywhere in the Sketcher window or simply on a bond,
then click the “Copy” box and select MOL or SMILES. The SMILES or MOL file for the selected portion will be in your computer’s
clipboard, ready to be pasted and used elsewhere. Right-clicking on an atom in a structure gives two options: elements and R-groups. The following video shows some useful Sketcher shortcuts. The 3D panel in the Sketcher offers the ability to designate atoms, bonds, rings, and other structural components for search.
Color atoms refer to the chemical features of a molecule and describe another aspect of 3D similarity. Please see the
Shape and Color section below. The “Color Atoms” toggle reveals a color-coded guide for the features
in the current structure. You can also assign additional features. When the toggle is On, a new tool appears in the toolbar
to allow the placement of color atoms on the shape atoms shown in the Sketcher. Click on the caret to access the available features, choose the one that you want, and click where you want that feature
to appear in your search. To place the color atom, either left-click on the “parent” shape atom you want to describe with
the color feature, or shift-click on multiple shape atoms to give the color atoms the average position of the parent shape
atoms. If multiple color atoms are desired for the same space atoms, change the color atom feature on the toolbar and
click on the color atom to “split” the color atom into different color options to refine for similarity. Note Splitting the color atoms does not affect the weighting for the preferred chemical features; all chemical features
included for that color atom will be weighted the same for similarity. When performing a substructure search, additional options are available under both the Bonds and Atoms tools, named Query
Bonds and Query Atoms, respectively. Using this selection, specific types of bonds or atoms are queried to narrow down
potential structures. The Query Bond categories include Aromatic, Single or Double, Single or Aromatic, Double or Aromatic,
Any Type, Keep Stereochemistry as drawn, Ring Bond Only, or Chain Bond Only. The Query Atom categories include Allowed
Atoms, Hydrogens, Non-Hydrogens, Ring Bonds, Unsaturated, and Uncharged. Molecule Search includes aspects of fingerprints, similarity measurements, and the OpenEye concepts of shape and color.
While these may be quite familiar, a brief introduction is provided for those who are new to these ideas. Molecular fingerprints are cheminformatics tools for virtual screening that are frequently used for similarity
comparisons. Fingerprints capture and encode structural features, representing the presence or absence of various substructures or
molecular characteristics.
For more details about OpenEye fingerprints, see the theory chapter of GraphSim TK. Note Performance and accuracy for fingerprints of macrocycles has not been performed and is not optimized. Molecule Search uses three different methods to define fingerprints: circular, path, and
tree. Figures 4, 5, and 6 show how these fingerprints are defined. Circular fingerprints enumerate the circular fragments of each heavy atom of the molecule up to a given radius. They find
substructures of molecules by determining the connectivity of atoms. Path fingerprints enumerate all linear fragments of the molecule. Lastly, tree fingerprints enumerate branching fragments of a molecule and encode molecular elements not captured by the other methods. Path and tree fingerprints can miss the connectivity of the atoms, but path fingerprints provide a large number of more
diverse results. We recommend tree or circular fingerprints for highly branched molecules. Similarity is defined by the comparison of specific molecular features (fingerprints). Molecules can be aligned in 2D or
3D to assess similarities in spatial conformations. Similarity also examines the relationship between a query’s molecular
structure and biological behavior for the prediction of hits.
A discussion of similarity comparison based on fingerprints can be found in the
Python Cookbook and
visualized in GraphSim TK. OpenEye uses several methods for similarity measurements. The Tanimoto coefficient addresses the
question of how similar the query and database molecules are to each
other. It measures the overlap of structural features. The Tversky measurement is similar, but it quantifies similarity based on shared substructures and molecular features.
The reference Tversky asks how similar the database molecules are to the query molecule. The Tversky coefficient is used
for ROCS measurements. OpenEye, Cadence Molecular Sciences uses shape and color to compare
the volume and chemical features of the query molecule to the database molecules for 3D similarity. Shape refers to OpenEye’s definition of
volume as a scalar field, where volume is considered to have different values at different points in space. Shape atoms
define similarity by the physical position and relative reference of the atoms to one another. Thus, the more the volumes of the
query molecule and the database molecule correspond, the more similar their shapes will be. For example, a shape-exclusive
search on the six carbon atoms in a benzene ring shows high similarity to a hexane ring due to similar relative
positions of the atoms. Atom groups with similar hybridization and 3D positioning also have high similarity. Color atoms define another aspect of molecular 3D similarity: chemical features.
The following chemical features are currently supported in Molecule Search: donor, acceptor, anion, cation, hydrophobe,
and rings. These features refine the search to find database molecules with similar chemical features at the defined
position of the shape atom(s). Because chemical features are relative to volume, color atoms are relative to the shape atoms. If a search is conducted with the “Color Atom” toggle turned Off, 3D search will use the default color atoms for the query
molecule. When the “Color Atom” toggle is turned On, the Sketcher automatically defines the color atoms for the query molecule,
but the color atoms can be changed to explore different chemical features of the database molecules. Changing
the color atoms only affects chemical features, not the volume defined by the shape atoms. For example,
methane (CH4) is a tetrahedral, neutral molecule. Ammonium (NH4+) is a tetrahedral ion. If a
cation color atom is placed on a query carbon atom of methane, then ammonium will be similar to the query based on shape and color. Note Custom color force fields are not supported within Molecule Search. The 2D similarity search performs a molecule search with a 2D representation of a molecule and provides only 2D molecules
as a result. In preparing for a 2D search, there are three parameters for molecular fingerprint similarity: fingerprint
type, similarity measure, and maximum hits. Depending on the database, each fingerprint type may include a virtual screen (VS)
option as well; for example, circularvs is the virtual screening option of circular fingerprints. The VS option increases
the speed of the search by excluding the values Some 2D databases do not have every fingerprint type available
in order to increase the efficiency and speed of searching these extremely large databases. A 3D similarity search automatically compares molecules based on their combined shape Tanimoto and color Tanimoto scores,
called the Tanimoto Combo. The shape Tanimoto and color Tanimoto each range from 0.0 to 1.0, so the maximum Tanimoto Combo is
2.0. In the results, you can subsequently see, and rank by, the shape Tanimoto or color Tanimoto scores. Shape is considered to be
the volume of a molecule: the more similar the volumes, the more similar the shapes. Color is defined by the chemical
properties of the fragments. When using the Sketcher for 3D search, 2D sketches are converted into 3D molecules using the OpenEye Omega TK to generate
a single 3D conformation. To use the color editing feature,
turn On the “Color Atoms” toggle and edit color atoms as desired. Color editing does not need to be on to generate
color Tanimoto scores, as color atoms will be added automatically to the query molecule. A substructure search attempts to match smaller substructures present in the larger query molecule to similar functional
groups in the database molecules. Three search
constraints defined in the OEMDLQueryOpts constant namespace
are available for the substructure search: This search method employs MDL queries, which are used to
quickly and efficiently perform a substructure search. Exact match search offers the ability to either restrict or maintain the stereochemistry, bonding, or hybridization of a structure
when performing a search. Four parameters are available for defining an “exact” structure for similarity: Match Stereo, Match without Stereo, Isomorphic
Match, and Match Uncolored Graph. Match Stereo and Match without Stereo determine whether any defined stereocenters must
be matched. In (S)-chloro(phenyl)methanol, the stereocenter may or may not need to be kept as part of the search. Loosening
the criteria for “exactness,” Isomorphic Match clears all elemental information from the molecule, but maintains the bonding
and hybridization of the molecule. For example, pyrazine and benzene would appear to be the same with Isomorphic Match
because the bonding and hybridization are the same for both molecules. Lastly, Match Uncolored Graph strips the molecule of all information (including bonding/hybridization) except for the
connectivity between the atoms; you have only generic atoms connected in a particular order. For the benzene example, the
search needs only to find 6 atoms (of any element and unspecified hybridization) in a ring. A title search will populate the Sketcher with any structure in the selected database that matches the exact input query
name. If you have a list of chemical names, this tool provides a quick way to put known molecules into Orion; this is an
especially helpful way to easily add molecules with company-specific IDs from corporate databases. Please refer to the
Save Results section to learn how to work further with the output from Molecule Search. Note If any names are misspelled, the molecule will not be found, but the misspelled name will show up in the list of
titles searched. To perform a molecule search, first determine which type of search you want to do. Add the molecule of interest, either
by sketching a molecule, typing its name into the search bar, or pasting a SMILES, MOL, or SDF file into the Sketcher. Next, choose the database collection you want to use. On the right side of each database is a color-coded circle to indicate
the status of the database. A green circle means that the database collection is loaded and ready to be used in the search.
There are three color statuses that indicate that the database collection is not available to be searched. A light-grey
circle indicates that the database collection is unloaded and therefore not currently loaded to an instance. An orange circle means
that the database collection is waiting to be loaded and queued. Lastly, a red circle indicates an “other” status. For
this status, contact your OpenEye administrator or OpenEye support for aid in troubleshooting this database collection.
To see all vendor and custom databases, click on the “Settings” icon at the top of the Database list, or go to the System
page from the blue navigation bar and click on the Database Tab. Note External database collections hosted by OpenEye, Cadence Molecular Sciences are updated periodically and may not
be aligned with Orion UI or Orion Floes releases. All databases include either the vendor version number or the
date of the latest update with the database name and size. Select any parameters that are available for your desired search type and database. Then click the “Search” button. After
submitting a search query, there are four statuses that will appear next to the query molecule in the top left of the
results page: “queued” in grey, “processing” in orange, “error” in red, and “success” in green. A search brings you to a separate results page. On the left, there is a panel showing the query molecule and details about
the search. On the right, the structures of the hits found in the search are depicted, each in their own tile box. Note A search that is designated as a success may return no results. This may indicate that no molecules similar to
the query were found with the designated parameters in that particular database. Consider changing the parameters or using
another database for your search. The found structures are listed by their Tanimoto score. Atoms in the 2D and 3D depictions of search results follow this coloring scheme
to ease quick comparisons between the results. For a 3D similarity search, the search results list 2D scores and rank the 3D Tanimoto score by either combo, shape, or
color. These options offer different methods of comparing the database molecules to the query molecule. The 3D results also
show a graph of the 3D combo versus the 2D score. Additionally, the view can be changed to either 3D, 2D similarity, shape,
or color visual depictions. Note For performance reasons, searches of more than 500 hits will not return Grapheme depictions that show the 2D similarity,
shape, and color results. Only 2D and 3D depictions will appear. Figures 8 through 13 are from the same 3D similarity search of ibuprofen. The 2D depiction simply shows the 2D representation of the molecule. The 3D depiction shows an
overlay of the query molecule in grey with the database molecule in green. The overlaid structures can be manipulated
with the mouse for better viewing. Molecules in individual result windows can be rescaled using the scroll wheel of your mouse.
The orientation of the query molecule on the left
may not correlate to the orientation of the database molecules in the results section. When a 3D search is performed using the “Color Atoms” feature, there is an additional viewing option. When the “Show Color Atoms”
toggle in the 3D display in On, the structures are displayed to show the color atoms from the search and a label for the corresponding chemical feature. The similarity depiction shows the similarity of the database molecule to the query molecule with a color gradient from yellow
to dark green, with dark green indicating greater 2D similarity. Pink sections of the database molecule are not similar to
the query molecule. The shape depiction uses volume depiction of the query molecule
overlaid with the 2D structure of the database molecule. The grey dotted outline indicates the outer edge of the query molecule’s
shape. A purple gradient indicates the database molecule’s shape alignment with the query molecule, with darker regions indicating
higher shape overlap between the molecules. Finally, the color depiction displays the color atoms in a similar fashion to
the 3D “Color Atoms” toggle in the Sketcher. The query molecule’s color atoms are overlaid with the database molecule’s color
atoms in 2D. Similar and overlapping color atoms appear as solid circles. A shading gradient indicates the database molecule’s
color alignment with the query molecule, with darker regions indicating higher color overlap between the molecules. Color
atoms without overlap and similarity appear as open circles. Note No similarity scores are provided for substructure or exact searches, but only the titles of the found structures. Common substructures found by a substructure search are highlighted in blue. Atoms and bonds that are not highlighted in
blue are unique to the database molecule. Any atom or bond deviation, depending on the parameters set for the
search, will not be highlighted in blue. Note With small databases such as the one in these examples, we can often trust that ibuprofen will be at the top of the hit
list for this query. But the order
of results may not be consistent with larger databases and more promiscuous queries. By default, atom stereo information is not considered. Figure 14 shows this result, which has 42 matches. The The Adding the third parameter, The combination of the As one would expect, the Again, an exact search seeks to restrict or maintain the stereochemistry, bonding, or hybridization of a structure. An
exact search may yield no results if the query molecule is not present in the database, or it may yield only the query
molecule. Some searches do yield several hits, depending on the database and parameter choices. Exact searches can be very sensitive
to the parameters chosen for the search. Any of the search results can be used as input for a new search. Click on a molecule in the search results to select it,
then click on its kebab menu () and select “New Search.” Alternatively, you can copy the found molecule as a
title, SMILES string, or MOL file. You can save the search results to a dataset or file, or
you can send the results directly to the On the left side for each individual query in the Search History, an editable line is available to name the query after
the search has been completed. On the right side, four options are available: Load Query, Load Results, Pin Query,
and Delete. These options allow you to take the corresponding action regarding the query. The “Load Results” button allows you
to access the results for that query. Note Orion retains the history of your searches automatically. They are cleared after 30 days unless you pin them.How to Add a Query Molecule
Add a Query Using the Sketcher
Color Tool for 3D Search
Additional Sketcher Tools for Substructure Search
Search Context
Fingerprints
Similarity
Shape and Color
Types of Search
2D Similarity Search
formal charge
and hybridization
(defined
in the OEFPAtomType namespace) from the
definition of the fingerprint.3D Similarity Search
Substructure Search
MatchAtomStereo
, AddBondAliphaticConstraint
, and
AddBondTopologyConstraint
. By default, atom stereo information is not considered. The MatchAtomStereo
option
forces the query molecules to match the specified atom configuration. The AddBondAliphaticConstraint
only allows
aliphatic query bonds to be mapped to aliphatic bonds in the database molecules. The AddBondTopologyConstraint
forces
chain bonds to be mapped only to chain bonds.Exact Match Search
Title Search
Setting up a Search
Search Results
2D Similarity Search Results
3D Similarity Search Results
Substructure Search Results
MatchAtomStereo
option forces the query molecules to match the specified atom stereo configuration, and the hits
decrease to five, as shown in Figure 15.AddBondAliphaticConstraint
only allows aliphatic query bonds to be mapped to aliphatic bonds in the database
molecules. Adding this constraint drops the hits to three. For example, naproxen in Figure 15 does not match the parameter
requirements. The aliphatic methyl in the query does not match the aromatic bond in the napthalene of naproxen.AddBondTopologyConstraint
, requires that chain bonds be mapped only to chain bonds. This
reduces the hits to two, as nogalamycin maps to a ring bond.MatchAtomStereo
and AddBondTopologyConstraint
options yields the same result shown in
Figure 17, as the disallowed aliphatic/aromatic matches are also chain/ring matches.AddBondAliphaticConstraint
and the AddBondTopologyConstraint
parameters either on their
own or combined are not nearly as restrictive as searches that include MatchAtomStereo
to restrict the atom stereo configuration.Exact Match Search Results
Using a Result for a New Search
How to Save or Pin Results
Analyze
page. When saving results to a file, the output can be saved to
a .csv, .oeb, or .sdf file; in addition, all search types except for 3D can be saved to a .smi file.Search History