MMDS Data Preparation

MMDS loads experimental protein structure files, along with their X-ray data or cryo-EM data. While there is a client to load structures in pieces, setting up the initial data set for bulk preparation and upload will speed up the first time.

A worked example

We will use a couple if example families to create a set of structures for MMDS. Working through this example should highlight the basic process that can be extended to more families of interest.

We are going to use two Serine Protease families, Trypsin and Thrombin.

Data Preparation

The first thing to do is to gather the structures, density files and other information for these two projects. We will then use Spruce to prepare OEDesignUnits and then finally load these results into MMDS.

$ mkdir mmds-data
$ cd mmds-data

Putting some PDB codes into a text file, for Trypsin and Thrombin, we can use Spruce to download the structures and density files.

trypsin_pdb_codes.list
1G3D
1GJ6
1K1L
1GHZ
1TRN
thrombin_pdb_codes.list
3RML
3DA9
3P17
3QWC
4BAH

Use spruce getpdb to get the structure files into a directory for each list, then use spruce getmap to get the density files for each structure file.

$ spruce getpdb --list trypsin_pdb_codes.list -o trypsin -v
1G3D: SUCCESS
1GJ6: SUCCESS
1K1L: SUCCESS
1GHZ: SUCCESS
1TRN: SUCCESS

$ spruce getmap --dir trypsin -v
1g3d: wrote trypsin/1g3d.mtz
1ghz: wrote trypsin/1ghz.mtz
1gj6: wrote trypsin/1gj6.mtz
1k1l: wrote trypsin/1k1l.mtz
1trn: wrote trypsin/1trn.mtz

$ spruce getpdb --list thrombin_pdb_codes.list -o thrombin -v
3RML: SUCCESS
3DA9: SUCCESS
3P17: SUCCESS
3QWC: SUCCESS
4BAH: SUCCESS

$ spruce getmap --dir thrombin -v
3da9: wrote thrombin/3da9.mtz
3p17: wrote thrombin/3p17.mtz
4bah: wrote thrombin/4bah.mtz
3qwc: wrote thrombin/3qwc.mtz
3rml: wrote thrombin/3rml.mtz

Meta data

For each structure, we want to create an extra file containing meta data associated with it. This includes info on the the author, date, method and ligand HET info. For public data, we can create these files from the PDB header info. For private/internal structure, use the OEStructureMetaData class.

An example meta data file:

{
  "Author": "E.TOYOTA,K.K.S.NG,H.SEKIZAKI,K.ITOH,K.TANIZAWA,M.N.G.JAMES",
  "Keywords": [
    "ENZYME-INHIBITOR COMPLEX",
    " COORDINATION METAL BASED INHIBITOR",
    "HYDROLASE"
  ],
  "ExperimentType": "X-RAY DIFFRACTION",
  "ExperimentDate": "17-JAN-01",
  "Revision": "1.3",
  "RevisionDate": "04-APR-18",
  "IridiumData": {
    "Category": 0,
    "LaD": 0,
    "ASaD": 0,
    "POL": false,
    "POAS": false,
    "AltConfs": false,
    "PackRes": false,
    "Excp": false,
    "IrrRFree": false,
    "PossCov": false,
    "DPI": 0.0897862915057751,
    "RFree": 0.189,
    "Resolution": 1.8,
    "HasMTZ": false
  },
  "SequenceMetadata": [
    {
      "ChainID": "A",
      "StartResNum": 11,
      "StartResInsCode": " ",
      "Sequence": "ASP-ASP-ASP-ASP-LYS-ILE-VAL-GLY-GLY-TYR-THR-CYS-GLY-ALA-ASN-THR-VAL-PRO-TYR-GLN-VAL-SER-LEU-ASN-SER-GLY-TYR-HIS-PHE-CYS-GLY-GLY-SER-LEU-ILE-ASN-SER-GLN-TRP-VAL-VAL-SER-ALA-ALA-HIS-CYS-TYR-LYS-SER-GLY-ILE-GLN-VAL-ARG-LEU-GLY-GLU-ASP-ASN-ILE-ASN-VAL-VAL-GLU-GLY-ASN-GLU-GLN-PHE-ILE-SER-ALA-SER-LYS-SER-ILE-VAL-HIS-PRO-SER-TYR-ASN-SER-ASN-THR-LEU-ASN-ASN-ASP-ILE-MET-LEU-ILE-LYS-LEU-LYS-SER-ALA-ALA-SER-LEU-ASN-SER-ARG-VAL-ALA-SER-ILE-SER-LEU-PRO-THR-SER-CYS-ALA-SER-ALA-GLY-THR-GLN-CYS-LEU-ILE-SER-GLY-TRP-GLY-ASN-THR-LYS-SER-SER-GLY-THR-SER-TYR-PRO-ASP-VAL-LEU-LYS-CYS-LEU-LYS-ALA-PRO-ILE-LEU-SER-ASP-SER-SER-CYS-LYS-SER-ALA-TYR-PRO-GLY-GLN-ILE-THR-SER-ASN-MET-PHE-CYS-ALA-GLY-TYR-LEU-GLU-GLY-GLY-LYS-ASP-SER-CYS-GLN-GLY-ASP-SER-GLY-GLY-PRO-VAL-VAL-CYS-SER-GLY-LYS-LEU-GLN-GLY-ILE-VAL-SER-TRP-GLY-SER-GLY-CYS-ALA-GLN-LYS-ASN-LYS-PRO-GLY-VAL-TYR-THR-LYS-VAL-CYS-ASN-TYR-VAL-SER-TRP-ILE-LYS-GLN-THR-ILE-ALA-SER-ASN"
    }
  ],
  "HeterogenMetadata": [
    {
      "Title": "108",
      "Id": "108",
      "Smiles": "C[C@@H](C(=O)O)NCc1cc(ccc1O)C(=N)N",
      "Type": "unknown",
      "Tautomers": []
    }
  ]
}

Here is a small script that will create these files for all the PDB structures in one directory.

$ python create_meta_data.py --dir trypsin -v
created trypsin/1K1L.pdb.json
created trypsin/1G3D.pdb.json
created trypsin/1GHZ.pdb.json
created trypsin/1GJ6.pdb.json
created trypsin/1TRN.pdb.json

$ python create_meta_data.py --dir thrombin -v
created thrombin/3DA9.pdb.json
created thrombin/3P17.pdb.json
created thrombin/3QWC.pdb.json
created thrombin/4BAH.pdb.json
created thrombin/3RML.pdb.json

Preparing biounits and OEDesignUnits

For each project, we want to use a reference OEDesignUnit to prepare all other structures. This ensures we have a consistent definition of the Bio Unit for all members and allows creating OEDesignUnits for apo structures, transferring the site info from the reference to the apo OEDesignUnit.

For Trypsin, we will use 1G3D and for Thrombin, 3RML. We will prep each of these separately and then prepare the rest using the results as reference.

$ cd trypsin
$ spruce prep -v --build-sc --cap-termini 1G3D.pdb
spruce prep run options

-- snip --

processing 1 pdb file(s) with 1 CPUs
PDB file: 1G3D.pdb
metadata file: ./1G3D.pdb.json
MTZ file: ./1g3d.mtz
Splitting ASU -> DUs

-- snip --

DU(s):
   1G3D(A)__DU__biounit
   1G3D(A) > 108(A-601)
Output:
  wrote ./trypsin/1G3D_A__DU__biounit.oedu
  wrote ./trypsin/1G3D_A__DU__108_A-601.oedu
  elapsed: 8.1 secs

Creating design units: 8.17 secs

$ cd ..

Note that Spruce created 2 OEDesignUnit files. We want the one with the ligand 108 in the title, which is the known ligand in this structure.

Do the same for Thrombin/3RML:

$ cd thrombin
$ spruce prep -v --build-sc --cap-termini 3RML.pdb

-- snip --

DU(s):
   3RML(HIL)__DU__biounit
   3RML(HIL) > M31(H-1)
Output:
  wrote ./thrombin/3RML_HIL__DU__biounit.oedu
  wrote ./thrombin/3RML_HIL__DU__M31_H-1.oedu
  elapsed: 20.4 secs

Creating design units: 20.42 secs

$ cd ..

Now we can prep all the other structures, using references. You may seem some warnings about charge addition or removal of clashing waters due to the addition of side chains. These are ok to ignore.

$ spruce prep --build-sc --cap-termini --design-ref trypsin/1G3D_A__DU__108_A-601.oedu --dir trypsin -o trypsin
Creating design units [*************************************************] 100.0%
  Elapsed: 25.08 secs
$ ls trypsin
1G3D.pdb                    1GHZ_A__DU__120_A-246.oedu  1GJ6_spruce_prep.log        1TRN.pdb.json               1ghz.mtz
1G3D.pdb.json               1GHZ_A__DU__biounit.oedu    1K1L.pdb                    1TRN_A__DU__apo_A-226.oedu  1gj6.mtz
1G3D_A__DU__108_A-601.oedu  1GHZ_spruce_prep.log        1K1L.pdb.json               1TRN_A__DU__biounit.oedu    1k1l.mtz
1G3D_A__DU__biounit.oedu    1GJ6.pdb                    1K1L_A__DU__FD3_A-999.oedu  1TRN_B__DU__apo_B-226.oedu  1trn.mtz
1G3D_spruce_prep.log        1GJ6.pdb.json               1K1L_A__DU__biounit.oedu    1TRN_B__DU__biounit.oedu    spruce.log
1GHZ.pdb                    1GJ6_A__DU__132_A-246.oedu  1K1L_spruce_prep.log        1TRN_spruce_prep.log
1GHZ.pdb.json               1GJ6_A__DU__biounit.oedu    1TRN.pdb                    1g3d.mtz

$ spruce prep --build-sc --cap-termini --design-ref thrombin/3RML_HIL__DU__M31_H-1.oedu --dir thrombin -o thrombin
Creating design units [*************************************************] 100.0%
  Elapsed: 26.08 secs
$ ls thrombin
3DA9.pdb                       3P17_HIL__DU__biounit.oedu     3RML.pdb.json               4BAH.pdb
3DA9.pdb.json                  3P17_spruce_prep.log           3RML_HIL__DU__M31_H-1.oedu  4BAH.pdb.json
3DA9_ABD__DU__44U_B-1.oedu     3QWC.pdb                       3RML_HIL__DU__biounit.oedu  4BAH_ABD__DU__MEL_B-1291.oedu
3DA9_ABD__DU__biounit.oedu     3QWC.pdb.json                  3RML_spruce_prep.log        4BAH_ABD__DU__biounit.oedu
3DA9_spruce_prep.log           3QWC_HIL__DU__98P_H-2001.oedu  3da9.mtz                    4BAH_spruce_prep.log
3P17.pdb                       3QWC_HIL__DU__biounit.oedu     3p17.mtz                    4bah.mtz
3P17.pdb.json                  3QWC_spruce_prep.log           3qwc.mtz                    spruce.log
3P17_HIL__DU__99P_H-1001.oedu  3RML.pdb                       3rml.mtz

For each experiment, we now have a structure file (*.pdb), a density file (*.mtz), a meta data file (*.pdb.json), a bio unit file and at least one OEDesignUnit with a designated binding site. Time to load these into MMDS.

Experiments

Experiments are the core of MMDS. Generally, we are talking about X-ray crystallography experiments with their corresponding density (MTZ) map.

MMDS can also load cryo-EM (with maps), NMR and modeled structures.

Each experiment needs at least 2 pieces of data, the original structure file (PDB), and a metadata file in JSON format. If there is a map file, that is included as well. We will also upload prepared biounit files, and subsequently any sites (OEDesignUnits) created by Spruce.

The meta data file is used to include important experiment info found in the header of public structures but often stored elsewhere for proprietary structures. An important piece of this data is to provide information on HET structures in the experiment file. This includes a mapping of the 3-letter HET code in the file to an internal compound ID as well as the SMILES for each structure to help in validation.

For uploading an Experiment to MMDS, we only want the former, but later we can upload the other DU as a Site.

Note

All commands below assume an MMDS profile called ‘local’ has been created to point to your MMDS server.

Adding an experiment

$ mmdscli --profile local experiment list
No resources found

$ mmdscli --profile local experiment add -h
Usage: mmdscli experiment add [OPTIONS] CODE STRUCTURE META

  Add a new experiment to MMDS

Options:
  --density density  Density/map file for this experiment
  --biounit BU       Biounit file. Add this flag multiple times to add all
                     biounits. If not provided,
                     mmdscli will look for spruce
                     created BUs in the same directory as STRUCTURE
  -y, --yes          Say yes to any yes/no questions.  [default: False]
  --json             Return JSON instead of simple text on stdout.
  -q, --quiet        Minimal output
  -v, --verbose      Verbose output
  -h, --help         Show this message and exit.

$ cd trypsin
$ mmdscli experiment add 1G3D 1G3D.pdb 1G3D.pdb.json --density 1g3d.mtz --json
{
    "DPI": 0.0897862915057751,
    "RFree": 0.189,
    "RFree_found": false,
    "author": "E.TOYOTA,K.K.S.NG,H.SEKIZAKI,K.ITOH,K.TANIZAWA,M.N.G.JAMES",
    "biounit": [
        {
            "experiment": 1787,
            "filename": "1G3D_A__DU__biounit.oedu",
            "id": 2126,
            "url": "http://localhost:8080/api/v1/biounit/2126/1G3D_A__DU__biounit.oedu"
        }
    ],
    "code": "1G3D",
    "created": "2019-04-16T08:57:05.539899Z",
    "density": {
        "filename": "1g3d.mtz",
        "json": null,
        "pk": 1306,
        "url": "http://localhost:8080/api/v1/experiment/1787/density/1g3d.mtz"
    },
    "id": 1787,
    "keywords": "ENZYME-INHIBITOR COMPLEX, COORDINATION METAL BASED INHIBITOR,HYDROLASE EC:3.4.21.4",
    "ligands": {
        "108": {
            "id": "108",
            "smiles": "C[C@@H](C(=O)O)NCc1cc(ccc1O)C(=N)N",
            "type": ""
        }
    },
    "method": "X-RAY DIFFRACTION",
    "public": true,
    "resolution": 1.8,
    "sites": [],
    "solved_date": "2001-01-17T00:00:00Z",
    "strdate": "17-JAN-01",
    "structure": {
        "filename": "1G3D.pdb",
        "json": "http://localhost:8080/api/v1/experiment/1787/structure/",
        "pk": 5907,
        "url": "http://localhost:8080/api/v1/experiment/1787/structure/1G3D.pdb"
    },
    "tags": []
}

$ mmdscli --profile local status
version    oe_license_expires      families    experiments    sites    contexts
---------  --------------------  ----------  -------------  -------  ----------
1.0.0      2019-09-04                     0              1        0           0

We can continue through each directory loading experiments one at a time, or we can use a script that uses the mmdsclient Python API to load all the experiments in a single directory at once.

Note

Since these are simple Python scripts, the MMDS_PROFILE environment variable needs to be set in order for the mmdsclient API to load the appropriate credentials.

$ MMDS_PROFILE=local python load_all_experiments.py trypsin
Found 5 input structure files
Found 5 input density files
Found 6 input biounit files
Found 5 input experiments
✔ Gathering existing experiments
Found 1 existing experiments.
Found 4 new experiments to load
loading 4 experiments [*************************************************] 100.0%
  Elapsed: 3.91 secs

$ MMDS_PROFILE=local python load_all_experiments.py thrombin
Found 5 input structure files
Found 5 input density files
Found 5 input biounit files
Found 5 input experiments
✔ Gathering existing experiments : 5
Found 5 existing experiments.
Found 5 new experiments to load
loading 5 experiments [*************************************************] 100.0%
  Elapsed: 4.48 secs

We can check the server status again, to see that all 10 of our experiments are loaded.

$ mmdscli --profile local status
version    oe_license_expires      families    experiments    sites    contexts
---------  --------------------  ----------  -------------  -------  ----------
1.0.0      2019-09-04                     0             10        0           0

We can also get a list.

Warning

Note that for 1000+ structures, this command will take some time, so be careful.

$ mmdscli --profile local experiment list --fields id,code,DPI,method,resolution
  id  code    method                   DPI    resolution  public
----  ------  -----------------  ---------  ------------  --------
1550  1G3D    X-RAY DIFFRACTION  0.0897863          1.8   True
1551  1K1L    X-RAY DIFFRACTION  0.308722           2.5   True
1552  1TRN    X-RAY DIFFRACTION  0.198978           2.2   True
1554  1GJ6    X-RAY DIFFRACTION  0.0852393          1.5   True
1553  1GHZ    X-RAY DIFFRACTION  0.0799989          1.39  True
1555  4BAH    X-RAY DIFFRACTION  0.128              1.94  True
1556  3DA9    X-RAY DIFFRACTION  0.1187             1.8   True
1558  3P17    X-RAY DIFFRACTION  0.0531176          1.43  True
1559  3QWC    X-RAY DIFFRACTION  0.0905379          1.75  True
1557  3RML    X-RAY DIFFRACTION  0.0683916          1.53  True

Download load_all_experiments.py

load_all_experiments.py

Sites

Once we have uploaded the experiments, we can upload one or more Sites (OEDesignUnits) associated with each. Note, we could download the BU we just uploaded, run it through Spruce to create the OEDesignUnit(s) and then upload. But since, we already created the OEDesignUnits above, we can just upload them.

The one case we’ll will see later is for apo sites. With no defined ligand, we will create apo sites as needed when adding them to a project (where the actual site location is known in the project reference structure)

$ mmdscli --profile local site add -h
Usage: mmdscli site add [OPTIONS] ExperimentID DU

  Add a new site into MMDS

Options:
  --bfactor-image PATH      Grapheme SVG depicting bfactor
  --interaction-image PATH  Grapheme SVG depicting protein-ligand interactions
  --density-image PATH      Grapheme SVG depicting electron density overlap
  -y, --yes                 Say yes to any yes/no questions.  [default: False]
  --json                    Return JSON instead of simple text on stdout.
  -q, --quiet               Minimal output
  -v, --verbose             Verbose output
  -h, --help                Show this message and exit.

If the SVGs have been created locally, they can be passed on the command line. If not, they will be created as part of the add command.

To upload the main site for 1G3D, we need to associate this site with the Experiment already uploaded. We can get the ID and then upload the Site.

$ mmdscli --profile local experiment list --code 1G3D --fields id,code
  id  code    public
----  ------  --------
1550  1G3D    True

$ mmdscli --profile local site add 1550 trypsin/1G3D_A__DU__108_A-601.oedu --json
{
    "bfactor_image": null,
    "components": {
        "cofactors": null,
        "counter_ions": null,
        "excipients": {
            "json": "http://localhost:8080/api/v1/site/1449/excipients/",
            "mask": "512",
            "title": "excipients",
            "url": "http://localhost:8080/api/v1/site/1449/excipients/1G3D_A-108_A-601_excipients.oeb"
        },
        "ligand": {
            "json": "http://localhost:8080/api/v1/site/1449/ligand/",
            "mask": "ligand",
            "title": "108(A-601)",
            "url": "http://localhost:8080/api/v1/site/1449/ligand/1G3D_A-108_A-601_ligand.oeb"
        },
        "lipids": null,
        "metals": {
            "json": "http://localhost:8080/api/v1/site/1449/metals/",
            "mask": "metals",
            "title": "metals",
            "url": "http://localhost:8080/api/v1/site/1449/metals/1G3D_A-108_A-601_metals.oeb"
        },
        "nucleic": null,
        "other_cofactors": null,
        "other_ligands": null,
        "other_nucleics": null,
        "other_proteins": null,
        "packing_residues": {
            "json": "http://localhost:8080/api/v1/site/1449/packing_residues/",
            "mask": "packing_residues",
            "title": "packing residues",
            "url": "http://localhost:8080/api/v1/site/1449/packing_residues/1G3D_A-108_A-601_packing_residues.oeb"
        },
        "protein": {
            "json": "http://localhost:8080/api/v1/site/1449/protein/",
            "mask": "protein",
            "title": "1G3D(A)",
            "url": "http://localhost:8080/api/v1/site/1449/protein/1G3D_A-108_A-601_protein.oeb"
        },
        "solvent": {
            "json": "http://localhost:8080/api/v1/site/1449/solvent/",
            "mask": "solvent",
            "title": "solvent",
            "url": "http://localhost:8080/api/v1/site/1449/solvent/1G3D_A-108_A-601_solvent.oeb"
        },
        "sugars": null
    },
    "density_image": {
        "pk": 2462,
        "url": "http://localhost:8080/api/mmds/images/2462/1G3D_A-108_A-601_density.svg"
    },
    "design_unit": {
        "filename": "1G3D_A__DU__108_A-601.oedu",
        "json": "http://localhost:8080/api/v1/site/1449/design_unit/",
        "pk": 1449,
        "url": "http://localhost:8080/api/v1/site/1449/design_unit/1G3D_A__DU__108_A-601.oedu"
    },
    "experiment": {
        "DPI": 0.0897862915057751,
        "RFree": 0.189,
        "RFree_found": false,
        "author": "E.TOYOTA,K.K.S.NG,H.SEKIZAKI,K.ITOH,K.TANIZAWA,M.N.G.JAMES",
        "biounit": [
            {
                "experiment": 1550,
                "filename": "1G3D_A__DU__biounit.oedu",
                "id": 1744,
                "url": "http://localhost:8080/api/v1/biounit/1744/1G3D_A__DU__biounit.oedu"
            }
        ],
        "code": "1G3D",
        "created": "2019-06-23T17:28:51.872454Z",
        "density": {
            "filename": "1g3d.mtz",
            "json": null,
            "pk": 1169,
            "url": "http://localhost:8080/api/v1/experiment/1550/density/1g3d.mtz"
        },
        "id": 1550,
        "keywords": "ENZYME-INHIBITOR COMPLEX, COORDINATION METAL BASED INHIBITOR,HYDROLASE",
        "ligands": [
            {
                "Id": "108",
                "Smiles": "C[C@@H](C(=O)O)NCc1cc(ccc1O)C(=N)N",
                "Tautomers": [],
                "Title": "108",
                "Type": "unknown"
            }
        ],
        "method": "X-RAY DIFFRACTION",
        "public": true,
        "resolution": 1.8,
        "revision": "1.3",
        "revision_date": "2018-04-04T00:00:00Z",
        "sites": [
            1449
        ],
        "solved_date": "2001-01-17T00:00:00Z",
        "strdate": "17-JAN-01",
        "structure": {
            "filename": "1G3D.pdb",
            "json": "http://localhost:8080/api/v1/experiment/1550/structure/",
            "pk": 5973,
            "url": "http://localhost:8080/api/v1/experiment/1550/structure/1G3D.pdb"
        },
        "tags": []
    },
    "favorite": 1000,
    "id": 1449,
    "interaction_image": {
        "pk": 2461,
        "url": "http://localhost:8080/api/mmds/images/2461/1G3D_A-108_A-601_interactions.svg"
    },
    "iridium_score": "MT",
    "ligand": {
        "id": 568,
        "ligandID": "108",
        "smiles": "C[C@@H](C(=O)[O-])NCc1cc(ccc1[O-])C(=[NH2+])N",
        "tags": []
    },
    "ligand_image": {
        "data": "http://localhost:8080/depict/?smiles=C%5BC%40%40H%5D%28C%28%3DO%29%5BO-%5D%29NCc1cc%28ccc1%5BO-%5D%29C%28%3D%5BNH2%2B%5D%29N",
        "pk": -1
    },
    "ligand_title": "108(A-601)",
    "surface_file": {
        "filename": "1G3D_A-108_A-601_surface.oesrf",
        "json": "http://localhost:8080/api/v1/site/1449/surface/",
        "pk": 5983,
        "url": "http://localhost:8080/api/v1/site/1449/surface/1G3D_A-108_A-601_surface.oesrf"
    },
    "tags": [],
    "title": "1G3D(A) > 108(A-601)"
}

$ mmdscli --profile local status
version    oe_license_expires      families    experiments    sites    contexts
---------  --------------------  ----------  -------------  -------  ----------
1.0.0      2019-09-04                     0             10        1           0

Note that this is just the original site for 1G3D. The power of MMDS is associating groups of sites into projects (contexts) with an associated superposition method.

Just like Experiments before, we can continue through each directory loading sites one at a time, or we can use a script that uses the mmdsclient Python API to load all the sites in a single directory at once.

Note

Since these are simple Python scripts, the MMDS_PROFILE environment variable needs to be set in order for the mmdsclient API to load the appropriate credentials.

$ MMDS_PROFILE=local python load_all_sites.py trypsin
✔ Gathering existing sites
✔ Gathering input DU files : 5
processing 5 groups of OEDesignUnit files [*****************************] 100.0%
  Elapsed: 5.72 secs

$ MMDS_PROFILE=local python load_all_sites.py thrombin
✔ Gathering existing sites : 6
✔ Gathering input DU files : 5
processing 5 groups of OEDesignUnit files [*****************************] 100.0%
  Elapsed: 5.64 secs

And a quick check:

Warning

Note that for 1000+ sites, this command will take some time, so be careful.

$ mmdscli --profile local site list
  id  title                    iridium_score      favorite  ligand_title
----  -----------------------  ---------------  ----------  --------------
1460  1TRN(A) > apo(A-226)     NT                     1000  apo(A-226)
1461  1G3D(A) > 108(A-601)     MT                     1000  108(A-601)
1462  1TRN(B) > apo(B-226)     NT                     1000  apo(B-226)
1463  1K1L(A) > FD3(A-999)     MT                     1000  FD3(A-999)
1464  1GHZ(A) > 120(A-246)     MT                     1000  120(A-246)
1465  1GJ6(A) > 132(A-246)     MT                     1000  132(A-246)
1466  3P17(HIL) > 99P(H-1001)  MT                     1000  99P(H-1001)
1467  3QWC(HIL) > 98P(H-2001)  MT                     1000  98P(H-2001)
1468  3DA9(ABD) > 44U(B-1)     HT                     1000  44U(B-1)
1469  4BAH(ABD) > MEL(B-1291)  MT                     1000  MEL(B-1291)
1470  3RML(HIL) > M31(H-1)     MT                     1000  M31(H-1)

Download load_all_sites.py

load_all_sites.py

Families

Family Tree

In order to organize structures into projects/contexts, we need to create a family tree containing the ultimate arrangement of structures by family and sub-family. We will collect structures by project, but these projects (contexts) are the leaves of the family tree.

The command mmds family can be used to add and manage this tree.

$ mmdscli family
Usage: mmdscli family [OPTIONS] COMMAND [ARGS]...

  Manage MMDS families

Options:
  -y, --yes      Say yes to any yes/no questions.  [default: False]
  --json         Return JSON instead of simple text on stdout.
  -q, --quiet    Minimal output
  -v, --verbose  Verbose output
  -h, --help     Show this message and exit.

Commands:
  add        Add a new family to MMDS
  delete     Delete one MMDS family
  deleteall  Delete all MMDS families
  info       Show info for one MMDS family
  list       List MMDS families
  update     Update a new family in MMDS

For families that are expected to be overlain in 3D, we will include a reference structure and a superposition method to use to align the sub-family/project reference structures. Spruce superposition methods include GlobalSequence, SiteSequence, DDM, and SSE.

For our example tree, the root will be RCSB, with the whole tree looking something like:

RCSB +
     |
     + Protease +
                |
                + Serine Protease

In this case, we want to provide a reference Serine Protease structure and method so that both of our contexts, Trypsin and Thrombin will be superposed into the same frame of reference. We will be using an OEDesignUnit from from 1G3D as the reference.

This set of commands will create the tree we want:

$ mmdscli --profile local family add RCSB
  id  title    alignable    unique_name
----  -------  -----------  -------------
1662  RCSB     False        RCSB

$ mmdscli --profile local family add Protease --parent RCSB
  id  title     alignable      parent  unique_name
----  --------  -----------  --------  -------------
1663  Protease  False            1662  RCSB|Protease

$ mmdscli --profile local family add "Serine Protease" --parent "RCSB|Protease"  --reference trypsin/1G3D_A__DU__108_A-601.oedu
  id  title            alignable    method          parent  unique_name
----  ---------------  -----------  ------------  --------  -----------------------------
1664  Serine Protease  True         SiteSequence      1663  RCSB|Protease|Serine Protease

$ mmdscli --profile local family list
  id  title            alignable    unique_name                      parent  method
----  ---------------  -----------  -----------------------------  --------  ------------
1662  RCSB             False        RCSB
1663  Protease         False        RCSB|Protease                      1662
1664  Serine Protease  True         RCSB|Protease|Serine Protease      1663  SiteSequence

Contexts

We need to create a “Trypsin” context/project under the “Protease|Serine Protease” part of the family tree.

$ mmdscli --profile local status
version     oe_license_expires      families    experiments    sites    contexts
----------  --------------------  ----------  -------------  -------  ----------
0.13.0dev1  2019-07-02                    20              1        1           0

Contexts require a title and a parent Family. Note that the combo of title and parent Family title needs to be unique. So for this example, we will create “RCSB|Protease|Serine Protease|Trypsin”. The parent Family ID from above is 1664.

Contexts also require a reference OEDesignUnit (which designates the site of interest) and an appropriate superposition method. SiteSequence is recommended. Turns out 1G3D is a good reference DU so we can use it for this context.

$ mmdscli --profile local context add --title Trypsin --family 1664 --ref-structure trypsin/1G3D_A__DU__108_A-601.oedu --json
{
    "active": true,
    "context_frames": [],
    "created": "2019-06-25T11:10:29.369075Z",
    "family": {
        "alignable": true,
        "children": [],
        "id": 1664,
        "method": "SiteSequence",
        "parent": 1663,
        "reference": {
            "filename": "1G3D_A__DU__108_A-601.oedu",
            "json": "http://localhost:8080/api/v1/file/6011/",
            "pk": 6011,
            "url": "http://localhost:8080/api/v1/file/6011/1G3D_A__DU__108_A-601.oedu"
        },
        "title": "Serine Protease",
        "unique_name": "RCSB|Protease|Serine Protease"
    },
    "id": 775,
    "method": "SiteSequence",
    "public": true,
    "ref_structure": {
        "filename": "1G3D_A__DU__108_A-601.oedu",
        "json": "http://localhost:8080/api/v1/context/775/ref_structure/",
        "pk": 6012,
        "url": "http://localhost:8080/api/v1/context/775/ref_structure/1G3D_A__DU__108_A-601.oedu"
    },
    "title": "Trypsin",
    "type": "project",
    "updated": "2019-06-25T11:10:29.386860Z",
    "view": null
}

$ mmdscli --profile local status
version     oe_license_expires      families    experiments    sites    contexts
----------  --------------------  ----------  -------------  -------  ----------
0.13.0dev1  2019-07-02                    20              1        1           1

Now do the same for Thrombin

$ mmdscli --profile local context add --title Thrombin --family 1664 --ref-structure thrombin/3RML_HIL__DU__M31_H-1.oedu --json
{
    "active": true,
    "context_frames": [],
    "created": "2019-06-25T11:12:14.353287Z",
    "family": {
        "alignable": true,
        "children": [],
        "id": 1664,
        "method": "SiteSequence",
        "parent": 1663,
        "reference": {
            "filename": "1G3D_A__DU__108_A-601.oedu",
            "json": "http://localhost:8080/api/v1/file/6011/",
            "pk": 6011,
            "url": "http://localhost:8080/api/v1/file/6011/1G3D_A__DU__108_A-601.oedu"
        },
        "title": "Serine Protease",
        "unique_name": "RCSB|Protease|Serine Protease"
    },
    "id": 776,
    "method": "SiteSequence",
    "public": true,
    "ref_structure": {
        "filename": "3RML_HIL__DU__M31_H-1.oedu",
        "json": "http://localhost:8080/api/v1/context/776/ref_structure/",
        "pk": 6013,
        "url": "http://localhost:8080/api/v1/context/776/ref_structure/3RML_HIL__DU__M31_H-1.oedu"
    },
    "title": "Thrombin",
    "type": "project",
    "updated": "2019-06-25T11:12:14.359154Z",
    "view": null
}

Frames

As Sites are added to Contexts, we create a new Frame object that holds a reference to the original Site and information about the 3D transform necessary to superpose into the current Context. As part of this process, we also download the MTZ or map file (if it exists) and create site-local grids for visualization.

$ mmdscli --profile local frame add -h
Usage: mmdscli frame add [OPTIONS] CONTEXT_ID SITE_ID

  Add site/frame into an MMDS context

Options:
  -y, --yes      Say yes to any yes/no questions.  [default: False]
  --json         Return JSON instead of simple text on stdout.
  -q, --quiet    Minimal output
  -v, --verbose  Verbose output
  -h, --help     Show this message and exit.

So, basically, all we need is the Trypsin Context ID we just created and the 1G3D Site ID.

$ mmdscli --profile local context list
  id  title     active    created                      method        public    type     updated
----  --------  --------  ---------------------------  ------------  --------  -------  ---------------------------
 775  Trypsin   True      2019-06-25T11:10:29.369075Z  SiteSequence  True      project  2019-06-25T11:10:29.386860Z
 776  Thrombin  True      2019-06-25T11:12:14.353287Z  SiteSequence  True      project  2019-06-25T11:12:14.359154Z

$ mmdscli --profile local site list
  id  title                    iridium_score      favorite  ligand_title
----  -----------------------  ---------------  ----------  --------------
1471  1TRN(A) > apo(A-226)     NT                     1000  apo(A-226)
1472  1G3D(A) > 108(A-601)     MT                     1000  108(A-601)
1475  1TRN(B) > apo(B-226)     NT                     1000  apo(B-226)
1473  1K1L(A) > FD3(A-999)     MT                     1000  FD3(A-999)
1474  1GHZ(A) > 120(A-246)     MT                     1000  120(A-246)
1476  1GJ6(A) > 132(A-246)     MT                     1000  132(A-246)
1477  3DA9(ABD) > 44U(B-1)     HT                     1000  44U(B-1)
1479  3QWC(HIL) > 98P(H-2001)  MT                     1000  98P(H-2001)
1478  3P17(HIL) > 99P(H-1001)  MT                     1000  99P(H-1001)
1480  4BAH(ABD) > MEL(B-1291)  MT                     1000  MEL(B-1291)
1481  3RML(HIL) > M31(H-1)     MT                     1000  M31(H-1)

$ mmdscli --profile local frame add 775 1472
  id  title                   context  context_title    created                      fit_chain_order      ligandID  ligand_title    proteinID
----  --------------------  ---------  ---------------  ---------------------------  -----------------  ----------  --------------  -----------
1245  1G3D(A) > 108(A-601)        775  Trypsin          2019-06-25T11:31:31.748126Z  ['A']                     108  108(A-601)      1G3D

Adding Frames in bulk

$ MMDS_PROFILE=local python add_experiment_to_context.py 776 --dir thrombin
Adding 5 experiments to context id: 776
adding 3DA9
  code: 3DA9 -> frame: 1256
adding 3P17
  code: 3P17 -> frame: 1257
adding 4BAH
  code: 4BAH -> frame: 1258
adding 3QWC
  code: 3QWC -> frame: 1259
adding 3RML
  code: 3RML -> frame: 1260

Do the same for the trypsin directory and Context 775:

$ MMDS_PROFILE=local python add_experiment_to_context.py 775 --dir trypsin

And finally we can see all the Frames we have loaded:

$ mmdscli --profile local frame list
  id  title                      context  context_title    created                      fit_chain_order    ligandID    ligand_title    proteinID
----  -----------------------  ---------  ---------------  ---------------------------  -----------------  ----------  --------------  -----------
1260  3RML(HIL) > M31(H-1)           776  Thrombin         2019-06-25T14:37:36.773416Z  ['H', 'I', 'L']    M31         M31(H-1)        3RML
1245  1G3D(A) > 108(A-601)           775  Trypsin          2019-06-25T11:31:31.748126Z  ['A']              108         108(A-601)      1G3D
1246  1TRN(A) > apo(A-226)           775  Trypsin          2019-06-25T11:48:11.735570Z  ['A']              apo         apo(A-226)      1TRN
1247  1TRN(B) > apo(B-226)           775  Trypsin          2019-06-25T11:48:13.465583Z  ['B']              apo         apo(B-226)      1TRN
1248  1GJ6(A) > 132(A-246)           775  Trypsin          2019-06-25T11:58:41.762722Z  ['A']              132         132(A-246)      1GJ6
1249  1K1L(A) > FD3(A-999)           775  Trypsin          2019-06-25T11:58:44.550983Z  ['A']              FD3         FD3(A-999)      1K1L
1250  1GHZ(A) > 120(A-246)           775  Trypsin          2019-06-25T11:58:48.019884Z  ['A']              120         120(A-246)      1GHZ
1256  3DA9(ABD) > 44U(B-1)           776  Thrombin         2019-06-25T14:37:22.831725Z  ['B', 'D', 'A']    44U         44U(B-1)        3DA9
1257  3P17(HIL) > 99P(H-1001)        776  Thrombin         2019-06-25T14:37:26.588846Z  ['H', 'I', 'L']    99P         99P(H-1001)     3P17
1258  4BAH(ABD) > MEL(B-1291)        776  Thrombin         2019-06-25T14:37:29.891082Z  ['B', 'D', 'A']    MEL         MEL(B-1291)     4BAH
1259  3QWC(HIL) > 98P(H-2001)        776  Thrombin         2019-06-25T14:37:33.119753Z  ['H', 'I', 'L']    98P         98P(H-2001)     3QWC

Download add_experiment_to_context.py

add_experiment_to_context.py

Results

On the main project page:

_images/mmds_tree_view.png

and in 3D:

_images/mmds_3D.png