Preparing Input¶
In this tutorial, we will prepare a design unit from the PDB structure 1JWP of the apo form of beta-lactamase. Using an apo protein structure to start these simulations is ideal. Starting with a protein structure either with a bound ligand or where the bound ligand was removed may bias the pocket search to favor that ligand binding pocket.
For more tutorials and information about protein preparation using Spruce, please refer to the Spruce documentation and the Spruce tutorials.
Note
If you have access to a stack with access to the MMDS database, prepare your protein PDB using Option 1. Otherwise, proceed with Option 2.
Design Unit Preparation Using a MMDS Reference Structure (Option 1)¶
If the protein being prepared has a reference structure in the MMDS database, this is the best route to preparation.
Finding a Reference Structure Using MMDS¶
For the OpenEye Cryptic Pocket Detection Floes package, we usually want to use an apo protein structure where a small drug molecule ligand is not present. However, it helps with the Spruce preparation of apo proteins to use a reference structure. The OpenEye MMDS dataset is an excellent starting place to find reference structures.
On the blue navigation bar on the Orion UI, click on the Sources icon to reach the Sources page, which should have three options. Click on MMDS. If you do not see MMDS as an option, we recommend that you use the preparation instructions in Option 2.
Enter beta-lactamase into the search bar on the MMDS page. Select the first option, “Beta-lactamase CTX-M-97.” This will redirect you to the 3D viewer for the protein.
The 5G18(A)altA > apo(AZR_A-301) structure should automatically show up as the first option. We will generate a dataset for this structure that will serve as the reference structure. In the left pane, the design unit name is shown in blue with four icons next to it. Click on the first icon with an arrow pointing up to the right.
This should open a pop-up window. Choose the project space where you want the file to be sent and click “Ok.” This will generate a dataset with a design unit for the MMDS structure in My Data of the chosen project space.
Preparing the PDB Using Spruce¶
Use the navigation bar to locate the Floe page. Click on the Floes Tab. Enter SPRUCE - Protein Preparation into the search bar.
Click on the “Launch Floe” button in the bottom-right corner. This will open up the Job Form. Choose where the output from the floe will be placed by specifying the output path. You can change the name of the floe job and the output dataset files to be something descriptive and distinct. Specify the PDB code(s) to download parameter as 1JWP. Click the “Choose input” button of the Optional Reference DU Dataset input parameter. We will use the 5G18 dataset that we generated in the previous step for this input. All other options can use their default values.
Click on the green “Start Job” button at the bottom of the Job Form. The floe should take approximately 10–15 minutes to complete. Once the job is done, take a look at the Preparation Success Report by double clicking on the page icon next to the orange folder icon (indicative of a collection). You will see that there were no issues detected for the preparation of the 1JWP PDB structure. In the preparation of other proteins, the Floe Reports are used to communicate issues encountered in the protein preparation. Fatal issues are redirected to the failure report, but issues with the quality of the prepared design unit(s) are reported in the success report.
The output design unit from the Spruce preparation of your protein includes several components, including a phosphate excipient. We do not want to carry the excipient component over to the simulation. As such, you will need to run the Subset Design Unit Floe following the instructions in this section of the tutorial.
Note
The SPRUCE - Protein Preparation Floe requires that there is enough information provided by the user to create a receptor. This information could include providing Ligand name(s) for the PDB being prepared, turning the Enumerate pockets option On, or providing a reference structure where a receptor has already been created. In rare cases, none of these options will work and the SPRUCE - Protein Preparation Floe will fail.
In a case like this, we recommend outputting the biological unit. After clicking the “Launch Floe” button for the SPRUCE - Protein Preparation Floe, fill out the job parameters that you previously added and turn the Show Cube Parameters option to Yes. This will reveal all of the floe’s cubes and their respective parameters. Scroll down to the Parallel Spruce Prep Cube and expand the options. Click on the General tab and scroll down to the option called Output biological unit and turn it On. Then click the “Start Job” button. The floe will output the intermediate biological units, but it will not evaluate the quality of the results. As such, this method should be a last resort and used with caution.
Design Unit Preparation Without a MMDS Reference Structure (Option 2)¶
This method should be used for proteins where there is no access to the MMDS database or where there is no reference structure with a bound ligand available for the protein.
Preparing the PDB Using Spruce¶
Use the navigation bar to switch to the Floes page. Click on the Floes Tab. Type SPRUCE - Protein Preparation into the search bar.
Click on the blue “Launch Floe” button. This will open the Job Form for the floe options. Choose where the output from the floe will be placed by specifying the output path. You can change the name of the floe job and the output dataset files to be something descriptive and distinct. Specify the PDB code(s) to download as 1JWP. Scroll down to the Unliganded Structure Parameters group and turn On Enumerate Pockets. This will allow the floe to search for pockets on the protein. This will initially be attempted with the OpenEye functions; if that fails, it will attempt to find pockets using fpocket.
The floe will take approximately 20 minutes to complete. Note that with the Enumerate Pockets parameter On, the output dataset will contain design units for each of the protein structures on the input PDB with each of the pockets found on the protein. The next floe only accepts a single input design unit record. As such, we will need to select a record to output onto a new dataset. From the navigation bar, click on the Data page. Next, you need to search for the data associated with your job (specified in the Output Path in the Job Form). For our example, we would click on the Team Data folder under Project Data. You can find your data by typing job into the search bar. This will show you a list of options below the search bar. Click on the tag that matches your job ID. This will show you all of the files associated with your job (assuming that files, datasets, and collections are visible).
You will notice that there is a success dataset, 1JWP_Spruce_prep_dataset, and a failure dataset, 1JWP_Failed_spruce_prep_dataset. Looking at the failure and success Floe Reports, note that the attempt to enumerate pockets using the OE pocket finder failed to find pockets, but the subsequent search for pockets with fpocket succeeded. The design units in the success dataset represent all of the fpocket pockets identified for the input structure.
Activate the 1JWP_Spruce_prep_dataset by clicking on the circle with a plus sign so that it becomes a green circle with a checkmark. You can see which datasets are active by clicking on the Active Datasets tab on Active Data Bar at the top of the page.
Switch to the 3D page. You will see the dataset with records for each of the structures and pocket combinations.
We will prepare the first record on the dataset. Click on the triangle next to the first record’s design unit name, click on the folder icon, and then click on the Duplicate This Record option in the drop-down menu.
In the pop-up, select New Dataset and make a unique name for the dataset. If you want to review the resulting dataset, select the Replace Active Data option.
As with the preparation using Option 1, there may be a phosphate excipient in the prepared design unit. Because you do not want to carry this over to the simulation, you will need to run the Subset Design Unit Floe following the instructions in this section of the tutorial.
Subset the Design Unit¶
It is a good idea to review the output design unit from the SPRUCE - Protein Preparation Floe before proceeding to the next phase. That floe is not inherently intended to generate a protein that is ready to be solvated and simulated. It will likely produce desirable components, such as the protein and solvent components. It will also produce undesirable components for a simulation, such as the packing residues and excipient components. The Subset Design Unit Floe allows you to choose which components to keep.
If you prepared the protein with Option 1, click on the Jobs Tab of the Floe page. Click on the SPRUCE - Protein Preparation job.
In the Results section, you will see the output dataset, Spruce_prep_dataset by default. Click on the “Show in Project Data” button which will redirect you to where that file is stored in your data.
If you prepared the protein with Option 2, navigate to the second dataset you created with a single record on it.
Click on the circle with a plus sign to activate the relevant dataset (Option 1 or Option 2) to your active datasets.
Navigate to the 3D page and make sure that the 1JWP dataset is the only one visible. You can expand the design unit details by clicking on the carets beside the design unit title. You will notice that the design unit includes the packing residues, the solvent, and the excipients.
While the Solvate and Equilibrate Target Protein Floe will handle the solvent and packing residues appropriately, it will keep the excipient molecules. In this case, the excipient phosphate molecule is not something that we want to keep for the simulation. As such, navigate to the Floes Tab on the Floe page, and search for the Subset Design Unit Floe. Click on the “Launch Floe” button.
In the Job Form, select the desired output path, customize the output dataset names as desired (we prefixed 1JWP to the default output dataset names), and select the design unit components to keep. In this case, we have chosen to use the default output design unit components, but you could choose a different set of design unit components to keep. Note that the default components do not include the excipient component, and so the output design unit will not be included in the phosphate excipient.
Click on the “Start Job” button. This floe should take approximately five minutes to complete. Once this floe is complete, you will be able to proceed to the next stage of the tutorials.