VIDA has several built-in scripting commands, such as Visible,
Select, Lock, Mark and Subset which take a string argument, where that
string is written in the query language described here. This query language is
somewhat similar to the command language of the program GRASP. The query
language provides a powerful query syntax where atoms, bonds, conformers and
molecules may be queried to see if they match certain properties. Triangles,
vertices and surfaces may be similarly queried.
A fairly simple example is the following command:
Select("ch='A' && rn=10")
This command will select all atoms in chain A and in residue 10 for every
molecule currently in memory.
A more complex query is the following:
Select("id=5 && (r=$hydrophobic || rn=(10,50))")
which selects every hydrophobic residue or residues with numbers between
10 and 50, but only matches atoms in the molecule with ID 5.
As the examples indicate, the general syntax of the query language is based
on expressions. Each expression consists of a property, a
mathematical operator and a value. Expressions may be combined using a
syntax similar to the C and/or Python programming languages. Boolean
operations are supported via the and operator &&, as well as the
or operator ||. Nesting of parts of the query is possible via
parentheses ( and ). The operators can also be spelled out as in
Python: and and or are valid for boolean operators.
VIDA’s query language supports a full range of mathematical
operators for testing a property’s value against the value(s) specified in
the query string. Supported are =, !=, >, <, >=, and <=.
Lists and Ranges
In addition to being able to specify a single value for a property, it is
also possible to specify a list of values, or a range of
values. Lists and ranges are provided as a convenience, since they allow
queries matching groups of atoms in a compact manner.
- LIST A list is a series of values enclosed in square brackets,
for example: rn=[4,12,38]. This is exactly equivalent
to (rn=4 || rn=12 || rn=38); each listed residue is selected.
It is also possible to negate a list: rn!=[4,12,38]
selects all residue except 4, 12, and 38, and is therefore exactly
equivalent to (rn!=4 && rn!=12 && rn!=38).
- RANGE A range is a pair of values enclosed in parentheses,
e.g. rn=(1,10). All residues between 1 and 10, inclusive are
selected, so this command is exactly equivalent to (rn>=1 && rn<=10).
Negated ranges are also possible: rn!=(1,10) is equivalent to
(rn<1 || rn>10), and therefore selects atoms not in the
Lists and ranges are only valid with the = operator and the !=
operator. It is not legal (and nonsensical) to try to evaluate an
expression such as rn>(1,10).
The following properties are defined and may be used for selection:
- id - This unsigned integer property limits the match for the
part of the query it appears in to match only the object with the given ID as
shown in the list window. Example: id=3
- a - Atom name. This string property matches the atom name.
Example: a=' CA '. [Note: The spaces before and after the atom
name within the quotes are important.] It may also use the pre-defined sets
a=$sch to match side-chain atoms and a=$backbone or
a=$ba to match backbone atoms.
- an - Atom number. This integer property matches the atom number.
- r - Residue name. This string property matches the residue name.
Example: r='ALA'. See also the pre-defined sets in the next
section titled “Macros/Pre-defined sets”.
- rn - Residue number. This integer property matches the residue number.
- ch - Chain. This string property matches the chain. The chain
should be a single letter. Example: ch='A'.
- model - Model number. This integer property matches atoms with
the given PDB-style model number (i.e. for different NMR models).
- altloc - Alternate Location indicator. This string property
matches atoms with the given PDB-style AlternateLocation property. The string
should be a single letter. Example: altloc='A'.
- icode - Insertion code. This string property matches atoms with
the given PDB-style insertion code. Example: icode='B'.
- occ - Occupancy. This floating point property matches atoms with
the given occupancy property. Example: occ>=0.5.
- b - B-factor. This floating point property matches atoms with
the given crystallographic temperature (B) factor. Example: b>=50.0.
- q - Partial charge. This floating point property matches atoms with
the given partial charge. Example: q>0.
- Q - Formal charge. This floating point property matches atoms with
the given formal charge. Example: Q>0.
- radius - Radius. This floating point property matches atoms with
the given radius. Example: radius<1.4.
- ac - Atom color. This unsigned integer property matches atoms with
the given atom color. The color should be specified as a packed
integer, e.g. from GetPackedColor method on an OEColor object.
Example: ac = OEColor(255, 255, 128).GetPackedColor().
- elem - Element number. This unsigned integer property matches
atoms with the given element number. Example: elem=6.
- weight - Molecular weight. This floating point property matches
molecules with the given molecular weight. Example: weight>10000.
- IsAminoAcid - This boolean property matches atoms whose residue
name is the same as an amino acid residue recognized by OEChem. Example:
IsAminoAcid=1. Equivalent to r=$aa.
- IsNucleicAcid - This boolean property matches atoms whose residue
name is the same as a nucleic acid recognized by OEChem. A, C, G, T, U.
Example: IsNucleicAcid=1. Equivalent to r=$dna.
- IsWater - This boolean property matches atoms which are in
a water molecule, as recognized by OEChem. Example: IsWater=1.
Equivalent to r=$wat.
- IsSubstrate - This boolean property matches atoms which are not
protein, nucleic acid or water, as determined by the above queries. Example: IsSubstrate=1.
Equivalent to (IsAminoAcid=0 && IsNucleicAcid=0 && IsWater=0).
- type - This string property may be one of “mol”, “atom”,
“bond”, “tri”, “vert” or “surf”. This limits the matches for the
part of the query it appears in to match either molecules, atoms, bonds,
triangles, vertices or surfaces, respectively. Example:
- index - This unsigned integer property limits the matches for
the part of the query it appears in to items with the given index. Each
atom, bond, triangle, etc., has an index assigned when the item is
created. Since the indices are generally not exposed to the user, this
command is probably of limited utility without OEChem-level access to the
molecules in memory. Example: id=2 && type='atom' && index=10
- query - This string property performs a substructure search
where the argument is treated as a SMARTS pattern, and so limits the query
to atoms which match the SMARTS pattern. Example: query='cccn'
- subset - This string property matches all atoms which are in
the previously-defined subset with the given name. Subsets may be defined
via the Subset command, so the subset query provides a way
of creating shorthand references to other complex queries. Example, first
define a subset: Subset('mysubset','rn=5 && id=2'). Then the
subset may be used as: subset='mysubset'.
- key - This unsigned integer property limits the match for the
part of the query it appears in to match only objects with the given key.
- pkey - This unsigned integer property limits the match for the
part of the query it appears in to match only objects whose parent has the
given key. Example: pkey=100000001.
In addition to explicitly naming residues and atoms, VIDA defines
several macros which may be used in query strings. These macros are
prefixed with a dollar sign ($). Macros are generally used with the
“residue name” (i.e. r=) property, although a few are used with
the “atom name” (i.e. a=) property. The definitions for these
sets are largely borrowed from RasMol.
- r=$aliphatic: ALA, GLY, LEU, VAL, ILE, PRO
- r=$hydroxyl: SER, THR, TYR
- r=$sulfur: CYS, MET
- r=$aromatic: TYR, HIS, TRP, PHE
- r=$charged: ASP, GLU, ARG, LYS
- r=$amide: GLN, ASN
- r=$hydrophobic: ALA, GLY, LEU, VAL, ILE, PRO, MET, PHE, TRP
- r=$polar: SER, THR, CYS, TYR, HIS, ASP, GLU, ASN, GLN, ARG, LYS
- r=$neutral: ALA, GLY, LEU, VAL, ILE, PRO, SER, THR, CYS, MET, PHE, TYR, TRP, ASN, GLN
- r=$acidic: ASP, GLU
- r=$basic: ARG, LYS
- r=$small: ALA, GLY, SER
- r=$medium: VAL, PRO, THR, CYS, ASP, ASN
- r=$large: ILE, MET, PHE, TYR, HIS, TRP, GLU, GLN, ARG, LYS
- r=$cyclic: HIS, PRO, TYR, TRP, PHE
- r=$dna: ADE, A, GUA, G, CYT, C, THY, T, URA, U
- r=$aa: all amino acids
- r=$at: ADE, A, THY, T, URA, U
- r=$cg: CYT, C, GUA, G
- r=$purine: ADE, A, GUA, G
- r=$pyrimidine: CYT, C, THY, T, URA, U
- r=$wat: WAT, H2O, HOH, TIP, SOL
- r=$substrate: this is defined as r!=$wat && r!=$aa && r!=$dna.
There are two atom name macros as well, a=$backbone (which may be
abbreviated as a=$ba), which matches specifically protein backbone
atoms, and a sidechain macro, a=$sch, which matches protein sidechain
atoms. Both of these can be negated (a!=$backbone or a!=$sch).
Scripting with ScratchScope
In addition to the built-in commands Visible,
Select, Lock, Mark and Subset, it’s quite
straightforward to use a selection string with any command which operates
on a scope, by binding the selection string to the ScratchScope.
For example the following function defines an atom coloring command
Subset("scratch",str) # use string to make named subset
ScratchSet("scratch") # bind it to scratch scope
AtomColorSetScoped(OEggColor(color),ScratchScope) # use it
With two small helper functions, these types of functions can be made even
easier to create:
With these functions defined, the ac function above and functions
similar to it can be easily defined. For example:
In this manner, any of the VIDA scripting commands which take a scope
argument can be easily expanded to take a selection string.