Selection Language

Introduction

VIDA has several built-in scripting commands, such as Visible, Select, Lock, Mark and Subset which take a string argument, where that string is written in the query language described here. This query language is somewhat similar to the command language of the program GRASP. The query language provides a powerful query syntax where atoms, bonds, conformers and molecules may be queried to see if they match certain properties. Triangles, vertices and surfaces may be similarly queried.

A fairly simple example is the following command:

Select("ch='A' && rn=10")

This command will select all atoms in chain A and in residue 10 for every molecule currently in memory.

A more complex query is the following:

Select("id=5 && (r=$hydrophobic || rn=(10,50))")

which selects every hydrophobic residue or residues with numbers between 10 and 50, but only matches atoms in the molecule with ID 5.

As the examples indicate, the general syntax of the query language is based on expressions. Each expression consists of a property, a mathematical operator and a value. Expressions may be combined using a syntax similar to the C and/or Python programming languages. Boolean operations are supported via the and operator &&, as well as the or operator ||. Nesting of parts of the query is possible via parentheses ( and ). The operators can also be spelled out as in Python: and and or are valid for boolean operators.

Operators

VIDA’s query language supports a full range of mathematical operators for testing a property’s value against the value(s) specified in the query string. Supported are =, !=, >, <, >=, and <=.

Lists and Ranges

In addition to being able to specify a single value for a property, it is also possible to specify a list of values, or a range of values. Lists and ranges are provided as a convenience, since they allow queries matching groups of atoms in a compact manner.

  • LIST A list is a series of values enclosed in square brackets, for example: rn=[4,12,38]. This is exactly equivalent to (rn=4 || rn=12 || rn=38); each listed residue is selected. It is also possible to negate a list: rn!=[4,12,38] selects all residue except 4, 12, and 38, and is therefore exactly equivalent to (rn!=4 && rn!=12 && rn!=38).

  • RANGE A range is a pair of values enclosed in parentheses, e.g. rn=(1,10). All residues between 1 and 10, inclusive are selected, so this command is exactly equivalent to (rn>=1 && rn<=10). Negated ranges are also possible: rn!=(1,10) is equivalent to (rn<1 || rn>10), and therefore selects atoms not in the given range.

Lists and ranges are only valid with the = operator and the != operator. It is not legal (and nonsensical) to try to evaluate an expression such as rn>(1,10).

Properties

The following properties are defined and may be used for selection:

  • id - This unsigned integer property limits the match for the part of the query it appears in to match only the object with the given ID as shown in the list window. Example: id=3

  • a - Atom name. This string property matches the atom name. Example: a=' CA '. [Note: The spaces before and after the atom name within the quotes are important.] It may also use the pre-defined sets a=$sch to match side-chain atoms and a=$backbone or a=$ba to match backbone atoms.

  • an - Atom number. This integer property matches the atom number. Example: an=5.

  • r - Residue name. This string property matches the residue name. Example: r='ALA'. See also the pre-defined sets in the next section titled “Macros/Pre-defined sets”.

  • rn - Residue number. This integer property matches the residue number. Example: rn=(1,50).

  • ch - Chain. This string property matches the chain. The chain should be a single letter. Example: ch='A'.

  • model - Model number. This integer property matches atoms with the given PDB-style model number (i.e. for different NMR models). Example: model=2.

  • altloc - Alternate Location indicator. This string property matches atoms with the given PDB-style AlternateLocation property. The string should be a single letter. Example: altloc='A'.

  • icode - Insertion code. This string property matches atoms with the given PDB-style insertion code. Example: icode='B'.

  • occ - Occupancy. This floating point property matches atoms with the given occupancy property. Example: occ>=0.5.

  • b - B-factor. This floating point property matches atoms with the given crystallographic temperature (B) factor. Example: b>=50.0.

  • q - Partial charge. This floating point property matches atoms with the given partial charge. Example: q>0.

  • Q - Formal charge. This floating point property matches atoms with the given formal charge. Example: Q>0.

  • radius - Radius. This floating point property matches atoms with the given radius. Example: radius<1.4.

  • ac - Atom color. This unsigned integer property matches atoms with the given atom color. The color should be specified as a packed integer, e.g. from GetPackedColor method on an OEColor object. Example: ac = OEColor(255, 255, 128).GetPackedColor().

  • elem - Element number. This unsigned integer property matches atoms with the given element number. Example: elem=6.

  • weight - Molecular weight. This floating point property matches molecules with the given molecular weight. Example: weight>10000.

  • IsAminoAcid - This boolean property matches atoms whose residue name is the same as an amino acid residue recognized by OEChem. Example: IsAminoAcid=1. Equivalent to r=$aa.

  • IsNucleicAcid - This boolean property matches atoms whose residue name is the same as a nucleic acid recognized by OEChem. A, C, G, T, U. Example: IsNucleicAcid=1. Equivalent to r=$dna.

  • IsWater - This boolean property matches atoms which are in a water molecule, as recognized by OEChem. Example: IsWater=1. Equivalent to r=$wat.

  • IsSubstrate - This boolean property matches atoms which are not protein, nucleic acid or water, as determined by the above queries. Example: IsSubstrate=1. Equivalent to (IsAminoAcid=0 &&  IsNucleicAcid=0 &&  IsWater=0).

  • type - This string property may be one of “mol”, “atom”, “bond”, “tri”, “vert” or “surf”. This limits the matches for the part of the query it appears in to match either molecules, atoms, bonds, triangles, vertices or surfaces, respectively. Example: type='atom'

  • index - This unsigned integer property limits the matches for the part of the query it appears in to items with the given index. Each atom, bond, triangle, etc., has an index assigned when the item is created. Since the indices are generally not exposed to the user, this command is probably of limited utility without OEChem-level access to the molecules in memory. Example: id=2 && type='atom' && index=10

  • query - This string property performs a substructure search where the argument is treated as a SMARTS pattern, and so limits the query to atoms which match the SMARTS pattern. Example: query='cccn'

  • subset - This string property matches all atoms which are in the previously-defined subset with the given name. Subsets may be defined via the Subset command, so the subset query provides a way of creating shorthand references to other complex queries. Example, first define a subset: Subset('mysubset','rn=5 && id=2'). Then the subset may be used as: subset='mysubset'.

  • key - This unsigned integer property limits the match for the part of the query it appears in to match only objects with the given key. Example: key=100000001.

  • pkey - This unsigned integer property limits the match for the part of the query it appears in to match only objects whose parent has the given key. Example: pkey=100000001.

Macros/Pre-defined sets

In addition to explicitly naming residues and atoms, VIDA defines several macros which may be used in query strings. These macros are prefixed with a dollar sign ($). Macros are generally used with the “residue name” (i.e. r=) property, although a few are used with the “atom name” (i.e. a=) property. The definitions for these sets are largely borrowed from RasMol.

  • r=$aliphatic: ALA, GLY, LEU, VAL, ILE, PRO

  • r=$hydroxyl: SER, THR, TYR

  • r=$sulfur: CYS, MET

  • r=$aromatic: TYR, HIS, TRP, PHE

  • r=$charged: ASP, GLU, ARG, LYS

  • r=$amide: GLN, ASN

  • r=$hydrophobic: ALA, GLY, LEU, VAL, ILE, PRO, MET, PHE, TRP

  • r=$polar: SER, THR, CYS, TYR, HIS, ASP, GLU, ASN, GLN, ARG, LYS

  • r=$neutral: ALA, GLY, LEU, VAL, ILE, PRO, SER, THR, CYS, MET, PHE, TYR, TRP, ASN, GLN

  • r=$acidic: ASP, GLU

  • r=$basic: ARG, LYS

  • r=$small: ALA, GLY, SER

  • r=$medium: VAL, PRO, THR, CYS, ASP, ASN

  • r=$large: ILE, MET, PHE, TYR, HIS, TRP, GLU, GLN, ARG, LYS

  • r=$cyclic: HIS, PRO, TYR, TRP, PHE

  • r=$dna: ADE, A, GUA, G, CYT, C, THY, T, URA, U

  • r=$aa: all amino acids

  • r=$at: ADE, A, THY, T, URA, U

  • r=$cg: CYT, C, GUA, G

  • r=$purine: ADE, A, GUA, G

  • r=$pyrimidine: CYT, C, THY, T, URA, U

  • r=$wat: WAT, H2O, HOH, TIP, SOL

  • r=$substrate: this is defined as r!=$wat &&  r!=$aa     &&  r!=$dna.

There are two atom name macros as well, a=$backbone (which may be abbreviated as a=$ba), which matches specifically protein backbone atoms, and a sidechain macro, a=$sch, which matches protein sidechain atoms. Both of these can be negated (a!=$backbone or a!=$sch).

Scripting with ScratchScope

In addition to the built-in commands Visible, Select, Lock, Mark and Subset, it’s quite straightforward to use a selection string with any command which operates on a scope, by binding the selection string to the ScratchScope.

For example the following function defines an atom coloring command ac:

def ac(color,str):
  Subset("scratch",str)  # use string to make named subset
  ScratchSet("scratch")  # bind it to scratch scope
  AtomColorSetScoped(OEggColor(color),ScratchScope) # use it

With two small helper functions, these types of functions can be made even easier to create:

def ScratchSubset(str):
    Subset("scratch",str)
    ScratchSet("scratch")
    return ScratchScope

def ScratchSubsetList(str):
    Subset("scratch",str)
    ScratchSet("scratch")
    return ScratchList()

With these functions defined, the ac function above and functions similar to it can be easily defined. For example:

def ac(color,str):
    AtomColorSetScoped(color,ScratchSubset(str))

def lc(color,str):
    LabelColorSetScoped(color,ScratchSubset(str))

def center(str):
    ViewerCenterSetScoped(ScratchSubset(str))

def show(str):
    Visible(ScratchSubsetList(str), True)

def hide(str):
    Visible(ScratchSubsetList(str), False)

In this manner, any of the VIDA scripting commands which take a scope argument can be easily expanded to take a selection string.