Selection Language¶
Introduction¶
VIDA has several built-in scripting commands, such as Visible, Select, Lock, Mark and Subset which take a string argument, where that string is written in the query language described here. This query language is somewhat similar to the command language of the program GRASP. The query language provides a powerful query syntax where atoms, bonds, conformers and molecules may be queried to see if they match certain properties. Triangles, vertices and surfaces may be similarly queried.
A fairly simple example is the following command:
Select("ch='A' && rn=10")
This command will select all atoms in chain A and in residue 10 for every molecule currently in memory.
A more complex query is the following:
Select("id=5 && (r=$hydrophobic || rn=(10,50))")
which selects every hydrophobic residue or residues with numbers between 10 and 50, but only matches atoms in the molecule with ID 5.
As the examples indicate, the general syntax of the query language is based
on expressions. Each expression consists of a property, a
mathematical operator and a value. Expressions may be combined using a
syntax similar to the C and/or Python programming languages. Boolean
operations are supported via the and operator &&
, as well as the
or operator ||
. Nesting of parts of the query is possible via
parentheses (
and )
. The operators can also be spelled out as in
Python: and
and or
are valid for boolean operators.
Operators¶
VIDA’s query language supports a full range of mathematical operators for testing a property’s value against the value(s) specified in the query string. Supported are =, !=, >, <, >=, and <=.
Lists and Ranges¶
In addition to being able to specify a single value for a property, it is also possible to specify a list of values, or a range of values. Lists and ranges are provided as a convenience, since they allow queries matching groups of atoms in a compact manner.
LIST A list is a series of values enclosed in square brackets, for example:
rn=[4,12,38]
. This is exactly equivalent to(rn=4 || rn=12 || rn=38)
; each listed residue is selected. It is also possible to negate a list:rn!=[4,12,38]
selects all residue except 4, 12, and 38, and is therefore exactly equivalent to(rn!=4 && rn!=12 && rn!=38)
.RANGE A range is a pair of values enclosed in parentheses, e.g.
rn=(1,10)
. All residues between 1 and 10, inclusive are selected, so this command is exactly equivalent to(rn>=1 && rn<=10)
. Negated ranges are also possible:rn!=(1,10)
is equivalent to(rn<1 || rn>10)
, and therefore selects atoms not in the given range.
Lists and ranges are only valid with the = operator and the !=
operator. It is not legal (and nonsensical) to try to evaluate an
expression such as rn>(1,10)
.
Properties¶
The following properties are defined and may be used for selection:
id - This unsigned integer property limits the match for the part of the query it appears in to match only the object with the given ID as shown in the list window. Example:
id=3
a - Atom name. This string property matches the atom name. Example:
a=' CA '
. [Note: The spaces before and after the atom name within the quotes are important.] It may also use the pre-defined setsa=$sch
to match side-chain atoms anda=$backbone
ora=$ba
to match backbone atoms.an - Atom number. This integer property matches the atom number. Example:
an=5
.r - Residue name. This string property matches the residue name. Example:
r='ALA'
. See also the pre-defined sets in the next section titled “Macros/Pre-defined sets”.rn - Residue number. This integer property matches the residue number. Example:
rn=(1,50)
.ch - Chain. This string property matches the chain. The chain should be a single letter. Example:
ch='A'
.model - Model number. This integer property matches atoms with the given PDB-style model number (i.e. for different NMR models). Example:
model=2
.altloc - Alternate Location indicator. This string property matches atoms with the given PDB-style AlternateLocation property. The string should be a single letter. Example:
altloc='A'
.icode - Insertion code. This string property matches atoms with the given PDB-style insertion code. Example:
icode='B'
.occ - Occupancy. This floating point property matches atoms with the given occupancy property. Example:
occ>=0.5
.b - B-factor. This floating point property matches atoms with the given crystallographic temperature (B) factor. Example:
b>=50.0
.q - Partial charge. This floating point property matches atoms with the given partial charge. Example:
q>0
.Q - Formal charge. This floating point property matches atoms with the given formal charge. Example:
Q>0
.radius - Radius. This floating point property matches atoms with the given radius. Example:
radius<1.4
.ac - Atom color. This unsigned integer property matches atoms with the given atom color. The color should be specified as a packed integer, e.g. from
GetPackedColor
method on anOEColor
object. Example:ac = OEColor(255, 255, 128).GetPackedColor()
.elem - Element number. This unsigned integer property matches atoms with the given element number. Example:
elem=6
.weight - Molecular weight. This floating point property matches molecules with the given molecular weight. Example:
weight>10000
.IsAminoAcid - This boolean property matches atoms whose residue name is the same as an amino acid residue recognized by OEChem. Example:
IsAminoAcid=1
. Equivalent tor=$aa
.IsNucleicAcid - This boolean property matches atoms whose residue name is the same as a nucleic acid recognized by OEChem. A, C, G, T, U. Example:
IsNucleicAcid=1
. Equivalent tor=$dna
.IsWater - This boolean property matches atoms which are in a water molecule, as recognized by OEChem. Example:
IsWater=1
. Equivalent tor=$wat
.IsSubstrate - This boolean property matches atoms which are not protein, nucleic acid or water, as determined by the above queries. Example:
IsSubstrate=1
. Equivalent to(IsAminoAcid=0 && IsNucleicAcid=0 && IsWater=0)
.type - This string property may be one of “mol”, “atom”, “bond”, “tri”, “vert” or “surf”. This limits the matches for the part of the query it appears in to match either molecules, atoms, bonds, triangles, vertices or surfaces, respectively. Example:
type='atom'
index - This unsigned integer property limits the matches for the part of the query it appears in to items with the given index. Each atom, bond, triangle, etc., has an index assigned when the item is created. Since the indices are generally not exposed to the user, this command is probably of limited utility without OEChem-level access to the molecules in memory. Example:
id=2 && type='atom' && index=10
query - This string property performs a substructure search where the argument is treated as a SMARTS pattern, and so limits the query to atoms which match the SMARTS pattern. Example:
query='cccn'
subset - This string property matches all atoms which are in the previously-defined subset with the given name. Subsets may be defined via the
Subset
command, so thesubset
query provides a way of creating shorthand references to other complex queries. Example, first define a subset:Subset('mysubset','rn=5 && id=2')
. Then the subset may be used as:subset='mysubset'
.key - This unsigned integer property limits the match for the part of the query it appears in to match only objects with the given key. Example:
key=100000001
.pkey - This unsigned integer property limits the match for the part of the query it appears in to match only objects whose parent has the given key. Example:
pkey=100000001
.
Macros/Pre-defined sets¶
In addition to explicitly naming residues and atoms, VIDA defines
several macros which may be used in query strings. These macros are
prefixed with a dollar sign ($). Macros are generally used with the
“residue name” (i.e. r=
) property, although a few are used with
the “atom name” (i.e. a=
) property. The definitions for these
sets are largely borrowed from RasMol.
r=$aliphatic
: ALA, GLY, LEU, VAL, ILE, PRO
r=$hydroxyl
: SER, THR, TYR
r=$sulfur
: CYS, MET
r=$aromatic
: TYR, HIS, TRP, PHE
r=$charged
: ASP, GLU, ARG, LYS
r=$amide
: GLN, ASN
r=$hydrophobic
: ALA, GLY, LEU, VAL, ILE, PRO, MET, PHE, TRP
r=$polar
: SER, THR, CYS, TYR, HIS, ASP, GLU, ASN, GLN, ARG, LYS
r=$neutral
: ALA, GLY, LEU, VAL, ILE, PRO, SER, THR, CYS, MET, PHE, TYR, TRP, ASN, GLN
r=$acidic
: ASP, GLU
r=$basic
: ARG, LYS
r=$small
: ALA, GLY, SER
r=$medium
: VAL, PRO, THR, CYS, ASP, ASN
r=$large
: ILE, MET, PHE, TYR, HIS, TRP, GLU, GLN, ARG, LYS
r=$cyclic
: HIS, PRO, TYR, TRP, PHE
r=$dna
: ADE, A, GUA, G, CYT, C, THY, T, URA, U
r=$aa
: all amino acids
r=$at
: ADE, A, THY, T, URA, U
r=$cg
: CYT, C, GUA, G
r=$purine
: ADE, A, GUA, G
r=$pyrimidine
: CYT, C, THY, T, URA, U
r=$wat
: WAT, H2O, HOH, TIP, SOL
r=$substrate
: this is defined asr!=$wat && r!=$aa && r!=$dna
.
There are two atom name macros as well, a=$backbone
(which may be
abbreviated as a=$ba
), which matches specifically protein backbone
atoms, and a sidechain macro, a=$sch
, which matches protein sidechain
atoms. Both of these can be negated (a!=$backbone
or a!=$sch
).
Scripting with ScratchScope¶
In addition to the built-in commands Visible
,
Select
, Lock
, Mark
and Subset
, it’s quite
straightforward to use a selection string with any command which operates
on a scope, by binding the selection string to the ScratchScope.
For example the following function defines an atom coloring command
ac
:
def ac(color,str):
Subset("scratch",str) # use string to make named subset
ScratchSet("scratch") # bind it to scratch scope
AtomColorSetScoped(OEggColor(color),ScratchScope) # use it
With two small helper functions, these types of functions can be made even easier to create:
def ScratchSubset(str):
Subset("scratch",str)
ScratchSet("scratch")
return ScratchScope
def ScratchSubsetList(str):
Subset("scratch",str)
ScratchSet("scratch")
return ScratchList()
With these functions defined, the ac
function above and functions
similar to it can be easily defined. For example:
def ac(color,str):
AtomColorSetScoped(color,ScratchSubset(str))
def lc(color,str):
LabelColorSetScoped(color,ScratchSubset(str))
def center(str):
ViewerCenterSetScoped(ScratchSubset(str))
def show(str):
Visible(ScratchSubsetList(str), True)
def hide(str):
Visible(ScratchSubsetList(str), False)
In this manner, any of the VIDA scripting commands which take a scope argument can be easily expanded to take a selection string.