SMARTS Pattern Matching

SMARTS Syntax

SMARTS is a line notation developed by Daylight Chemical Information Systems for compactly representing molecular substructure queries. The SMARTS language can be considered an extension or generalization of Daylight’s SMILES notation for representing discrete molecules.

A SMARTS syntax overview can be found at the documentation of SMARTS on the Daylight Chemical Information Systems site

Atom Primitives

Symbol

Description

Argument?

Default Value

A

non-aromatic (aliphatic) atom

no

(no default)

a

aromatic atom

no

(no default)

D<n>

degree (explicit connections)

yes

(no default)

H<n>

total hydrogen count

optional

exactly one

h<n>

implicit hydrogen count

optional

exactly one

R<n>

ring bond count [1]

optional

any ring atom

x<n>

ring bond count [2]

optional

any ring atom

r<n>

smallest ring size

optional

any ring atom

v<n>

valence (total bond order)

yes

(no default)

X<n>

connectivity (total connections)

yes

(no default)

#<n>

atomic number

yes

(no default)

+<n>

positive charge

optional

+1 cation (++ is +2, etc)

-<n>

negative charge

optional

-1 anion (– is -2, etc)

^<n>

atomic hybridization [3]

yes

(no default)

@

anticlockwise local chirality

no

@@

clockwise local chirality

no

@<n>

chirality class

optional

anticlockwise

n explicit

atomic mass no

Table footnote:

[1] The semantics of the ring count primitive, R, differs slightly between Daylight SMARTS and OpenEye SMARTS. In Daylight semantics, R<n> means that an atom is in n rings of the chosen SSSR. As the choice of SSSR is non-deterministic, this interpretation can cause an arbitrary set of atoms to match depending upon input order. For example, in the symmetric molecule, cubane, four of the eight atoms will appear in two SSSR rings, and half of the atoms appear in three, but the choice is made almost randomly. Rather than attempt to reproduce these weak semantics, OpenEye strengthens the definition of R<n> to mean the number of ring bonds to an atom, which is graph invariant and therefore independent of a molecule’s input order. Notice, that the interpretation of [R] and [R0], i.e. ring membership, remains the same. Similarly, Daylight [R1] is approximately equal to OpenEye [R2], and Daylight [R2] is approximately equivalent to OpenEye [R3].

[2] Note that [x] was implemented by Daylight v4.9 and OEChem TK 1.5, and is exactly synonymous to OEChem TK’s [R].

[3] The atomic hybridization primitive, ^, is an OpenEye* extension that is not available in Daylight SMARTS, but can be implemented using recursive SMARTS.

Bond Primitives

Syntax

Description

default

single or aromatic

-

single bond (not aromatic)

=

double bond (not aromatic)

#

triple bond

~

any bond (wildcard)

:

aromatic bond

@

ring bond

/

directional single ‘up’ bond

\

directional single ‘down’ bond

/?

directional ‘up’ or unspecified

\?

directional ‘down’ or unspecified

Logical Operators

Syntax

Description

!e

not e

e1 & e2

e1 and e2 (high precedence)

e1,e2

e1 or e2

e1;e2

e1 and e2 (low precedence)

Examples

Example of matching any defined stereo centers

SMARTS: [C;$([*@]([*])([*]))]

../_images/SMARTS_MatchingDefinedStereo.png
Example of matching any undefined stereo centers

SMARTS: [C;!$([*@]([*])([*]))]

../_images/SMARTS_MatchingUndefinedStereo.png