SMARTS Pattern Matching

SMARTS Syntax

SMARTS is a line notation developed by Daylight Chemical Information Systems for compactly representing molecular substructure queries. The SMARTS language can be considered an extension or generalization of Daylight’s SMILES notation for representing discrete molecules.

Atom Primitives

Symbol Description Argument? Default Value
A non-aromatic (aliphatic) atom no (no default)
a aromatic atom no (no default)
D<n> degree (explicit connections) yes (no default)
H<n> total hydrogen count optional exactly one
h<n> implicit hydrogen count optional exactly one
R<n> ring bond count [1] optional any ring atom
x<n> ring bond count [2] optional any ring atom
r<n> smallest ring size optional any ring atom
v<n> valence (total bond order) yes (no default)
X<n> connectivity (total connections) yes (no default)
#<n> atomic number yes (no default)
+<n> positive charge optional +1 cation (++ is +2, etc)
-<n> negative charge optional -1 anion (– is -2, etc)
^<n> atomic hybridization [3] yes (no default)
@ anticlockwise local chirality no  
@@ clockwise local chirality no  
@<n> chirality class optional anticlockwise
n explicit atomic mass no    

Table footnote:

[1] The semantics of the ring count primitive, R, differs slightly between Daylight SMARTS and OpenEye SMARTS. In Daylight semantics, R<n> means that an atom is in n rings of the chosen SSSR. As the choice of SSSR is non-deterministic, this interpretation can cause an arbitrary set of atoms to match depending upon input order. For example, in the symmetric molecule, cubane, four of the eight atoms will appear in two SSSR rings, and half of the atoms appear in three, but the choice is made almost randomly. Rather than attempt to reproduce these weak semantics, OpenEye strengthens the definition of R<n> to mean the number of ring bonds to an atom, which is graph invariant and therefore independent of a molecule’s input order. Notice, that the interpretation of [R] and [R0], i.e. ring membership, remains the same. Similarly, Daylight [R1] is approximately equal to OpenEye [R2], and Daylight [R2] is approximately equivalent to OpenEye [R3].

[2] Note that [x] was implemented by Daylight v4.9 and OEChem 1.5, and is exactly synonymous to OEChem‘s [R].

[3] The atomic hybridization primitive, ^, is an OpenEye extension that is not available in Daylight SMARTS, but can be implemented using recursive SMARTS.

Bond Primitives

Syntax Description
default (single or aromatic)
- single bond (not aromatic)
= double bond (not aromatic)
# triple bond
~ any bond (wildcard)
: aromatic bond
@ ring bond
/ directional single ‘up’ bond
\ directional single ‘down’ bond
/? directional ‘up’ or unspecified
\? directional ‘down’ or unspecified

Logical Operators

Syntax Description
!e not e
e1 & e2 e1 and e2 (high precedence)
e1,e2 e1 or e2
e1;e2 e1 and e2 (low precedence)

Examples

Example of matching any defined stereo centers
SMARTS: [C;$([*@]([*])([*]))]
../_images/SMARTS_MatchingDefinedStereo.png
Example of matching any undefined stereo centers
SMARTS: [C;!$([*@]([*])([*]))]
../_images/SMARTS_MatchingUndefinedStereo.png