# SZYBKI¶

## Command Line Interface¶

A description of the command line interface can be obtained by executing SZYBKI with the --help option.

prompt> szybki --help


will generate the following output:

Simple parameter list

Execute Options
-param : A parameter file

Input Molecules
Input Files Option 1 : Input molecule and optional protein
-in : Name of the input molecule file. Aliases are -in and -i
-protein : Name of the input protein file (not required)

Input Files Option 2 : Ligand-protein complex
-complex : Name of the input protein file with ligand(s)

Output
Output File Options
-prefix : Replaces the default prefix for log, status, param and rpt
files

Explicitly specify individual file names
-out : Name of the output molecular file

Potential Function
-ff : Force field to be used. Overwrites selection made with "-MMFF94S"

Optimization
-optGeometry : Specification of degrees of freedom used for optimization

szybki --help simple      : Get a list of simple parameters (as seen above)
szybki --help all         : Get a complete list of parameters
szybki --help defaults    : List the defaults for all parameters
szybki --help <parameter> : Get detailed help on a parameter
szybki --help html        : Create an html help file for this program
szybki --help versions    : List the toolkits and versions used in the application


### Required Parameters¶

The only required input parameter is defined in the option -in.

### Execute Options¶

-param

Command line options will be read from the specified file. This file may have been generated from a previous run or may be constructed de novo. The default name of the file is szybki.param. Any parameter in the parameter setup file is superseded by the parameter on the command line. For example running szybki -param szybki.param -i my.pdb will perform calculations for the molecule my.pdb while using all other parameters taken from szybki.param.

-mpi_np <n>

Specifies the number of processors n when SZYBKI is run in MPI mode.

-mpi_hostfile <filename>

Specifies the name of the file containing processors configuration. For every host this file should contain a line host_name n where n is the number of processors on the host.

### Input Molecules¶

SZYBKI can handle molecular files in the following formats:

SZYBKI file format support
File extension Description
mol MDL Mol File
mmd, mmod Macromodel
mol2 Tripos Sybyl mol2 file
oeb New Style OEBinary
pdb Protein Databank PDB file
sd, sdf MDL SD File
xyz XMol XYZ format

#### Input Files Option 1: Input molecule and optional protein¶

-in <filename>

Molecular input file name containing 3D coordinates in any format supported by OEChem. In order to comply with the previous SZYBKI releases, an alias -i can be used instead of -in. This option might be used as a keyless parameter without the -in or -i keys but only when used last or next to last when the last option on the command line is -out. Warning: File name: szybki_out.oeb is used as the default output file name, so should be avoided as an input file name.

-i <filename>

An alias to -in.

-ligands <filename>

An alias to -in.

-protein <filename>

Protein input file with 3D coordinates in any format supported by OEChem. Works in combination with the -in option. No removal of pre-existing ligands in file filename will be done as in the case of the -complex option. This option can be aliased with -p for compatibility with previous SZYBKI releases.

-p <filename>

An alias to -protein.

#### Input Files Option 2: Ligand-protein complex¶

-complex <filename>

Input file with 3D coordinates of a protein-ligand(s) complex in any format listed in the table above. All ligands contained in the molecule file filename will be optimized one by one in the presence of all others.

#### Molecule preparation¶

-strip_water

Causes removal of water molecules from the input protein when options -protein or -complex are used.

-largest_part

Calculations are performed only for the largest fragment of the noncovalent input complex. For example when the input file contains coordinates for the salt, $$[Large\_cation^+]\cdot[Cl^-]$$, the use of the -largest_part will cause the $$Cl^-$$ anion to be ignored. By default calculations are done for the entire complex.

### Output¶

#### Screen Options¶

-silent

By default the names of the processed molecules are displayed. The use of the option -silent will suppress this output.

-verbose

An extensive general output will be generated containing initial and final energy and gradient data, as well as an optimization report.

#### Output File Options¶

-prefix <pn>

Replaces szybki prefix in .log, .rpt (report), .status , .param and _out.oeb files, with the input string pn.

-report

An output file in tabular form of final energy components for every molecule/conformer will be generated. The name of the file is fixed as: prefix.rpt where prefix is szybki by default or determined by the -prefix option.

-sdtag <string>

In the case when the output file is in the SD format, an energy tag can be added. For single molecules by default all energy terms (MMFF, solvation and constraint) plus Total energy are added, so the option has no effect. “Total energy” always contains the constraint terms. In the case of protein-ligand systems only the relevant terms (Ligand-Protein Energy and all of its components) are attached as SD tags by default. The option can be used therefore to add one of the remaining terms from the list below. If string is set to all, all energy terms are tagged as SD tags.

1. MMFF VdW
2. MMFF Coulomb
3. MMFF Bond
4. MMFF Bend
5. MMFF StretchBend
6. MMFF Torsion
7. MMFF Improper Torsion
8. L MMFF VdW
9. L MMFF Coulomb
10. L MMFF Bond
11. L MMFF Bend
12. L MMFF StretchBend
13. L MMFF Torsion
14. L MMFF Improper Torsion
15. P MMFF VdW
16. P MMFF Coulomb
17. P MMFF Bond
18. P MMFF Bend
19. P MMFF StretchBend
20. P MMFF Torsion
21. P MMFF Improper Torsion
22. PL MMFF VdW
23. PL MMFF Coulomb
24. Ligand-Protein Energy
25. Sheffield Solvation
26. Constraint Potential
27. PB Solvent
28. Area Solvent
29. __VdW
30. __Coulomb
31. __Protein desolv
32. __Ligand desolv
33. __Solvent screening
34. __Grid Coulomb
35. __Exact Coulomb
36. P/L energy
37. __AMBER VdW
38. __AMBER Coulomb,
39. Torsion Constraint

If parameter string does not correspond to one of the available energy terms, no tag will be written. Tags starting with L, P, PL and __ are:

• L: ligand intra-molecular terms only
• P: protein intra-molecular terms only
• PL: protein-ligand inter-molecular terms only
• __: protein-ligand interaction terms
-keepFailures

By default molecules which failed during processing are store in the file prefix.FAIL.input_format where input_format is that of the input molecule used in option -ligands. If the flag is set to true, those failed molecules are written to molecule output file.

-heavy_rms

By default all atoms RMSD after optimization is reported in the log file. This option replaces the default with the heavy atoms RMSD.

#### Explicitly specify individual file names¶

-l <filename>

An alias to -log.

-log <prefix>

Prefix of the log file name, prefix.log. If omitted, the prefix szybki is used by default. This option is aliased to -l.

-o <filename>

An alias to -out.

-out <filename>

Output file name, in any format supported by OEChem, for an optimized ligand. If not specified, szybki_out.oeb or prefix_out.oeb (when -prefix is used) will be generated. Alias -o can be used instead of -out. Can be used as a keyless parameter without the -out key when it is last on the command line just after the -in (or -i, or -ligands) option.

-out_protein <filename>

Partially optimized protein will be saved in a file named filename.

-out_complex <filename>

Protein-ligand complex for partially optimized protein will be saved in a file named filename

-reportFile <filename>

Sets user selected file name, filename, for the tab file. Supersedes the -report option described above.

### Potential Function Options¶

-ff

Force field to be used. Valid options are: MMFF94, MMFF94S, Amber_MMFF94, Amber_MMFF94S, IEFF_MMFF94 and IEFF_MMFF94S This option overwrites deprecated -MMFF94S when both are used simultaneously. The combined potential Amber-MMFF94 (or Amber-MMFF94S) can be applied only for protein-ligand interaction using a rigid protein model. All intramolecular ligand interactions are described by the MMFF94 (or MMFF94S) force field while intermolecular protein-ligand interactions are handled by the Amber force field.IEFF_MMFF94 and IEFF_MMFF94S potentials can be used only for intermolecular interactions provided that electrostatic multipoles are assigned to the molecules.

#### Charging¶

-am1bcc

AM1BCC charges ([Jakalian-2002]) for are calculated for every conformation and used for the PB or Sheffield solvation energy. Note that AM1BCC charges are conventionally calculated for just one conformer or a few conformers of a multiconformer molecule, those conformer(s) with Electrostatically Least-interacting Functional (ELF) groups, so calculating AM1BCC charges for every conformer with this option is unconventional. For conventional behaviour, apply AM1BCC ELF charges to the molecule separately and then use that as the input structure with the -current_charges option.

-current_charges

During optimization of molecules in solution with the use of the Poisson-Boltzmann solvation model, free energy of solvation and solvent forces can be calculated with the use of atomic partial charges other than MMFF. Option -current_charges allows the use of partial charges read from the input molecular file. In the case where the ligand is optimized inside a protein, protein-ligand electrostatic interactions (Coulomb and/or Poisson-Boltzmann) could be calculated based on ligand and protein partial charges read from the molecular input file(s). When input partial charges are not found, the MMFF94 partial charges will be used.

#### Force Field Modifications¶

-exact_vdw

By default the VdW protein-ligand interaction is calculated with the use of lookup table in order to speedup the calculations. This option allows using the exact analytical VdW potential for the optimization of a ligand in the protein binding site.

-mod_vdw

Regular MMFF Van der Waals interactions equation will be replaced with:

$$V_{vdw} = \left\{ \begin{array}{cc} \epsilon_{ij} \left( \frac{1.07R_{ij}}{r_{ij}+0.07R_{ij}} \right) ^7 \left( \frac{1.12R_{ij}^7}{r_{ij}^7 +0.12R_{ij}} -2 \right) & \mbox{for r_{ij} < R_{ij}} \\ -\epsilon_{ij} & \mbox{for r_{ij} \geq R_{ij}} \end{array} \right.$$

in which no attractive VdW forces are present. This type of VdW potential prevents so called “hydrophobic collapse”.

-neglect_frozen

After an optimization with frozen terms, the default behavior calls for a full single-point calculation of the whole system. The -neglect_frozen option will skip this final calculation of the whole system, thus yielding results for only the non-frozen pieces. When there is a large frozen section, this can drastically reduce memory and cpu usage.

-noCoulomb

Electrostatic terms defined in the Electrostatic interactions section will be excluded from the force field potential. This option might be useful to prevent generation of folded structures.

-protein_vdw <r>

Calculation of VdW protein-ligand interaction energy will be limited to a sphere of radius r. The default value is 18.0 Å. In many applications a value as small as 10 Å can be used with essentially no effect on the final optimized ligand geometry. Legal range is 5-500 Å.

-strict

Enforces strict atom typing. This is a default behavior. When the value of the flag is selected to be false, this enforcement is removed.

-vdw_cutoff

Sets distance for intramolecular VdW interactions. By default all atom pairs in the molecules contribute to the molecule’s VdW energy. The legal range is 5 - 500 Å.

#### Ligand solvent flags (used when -protein or -complex are not specified)¶

-inner_dielectric <d>

The default value for the protein or ligand dielectric constant of 1.0 can be changed to a user selected value d. This option is aliased as -protein_dielectric. The upper allowed limit is 20.0.

-shefA <a>

Parameter a in the Sheffield solvation potential given in option -sheffield. If the option is not used a value of 1.553149 is assigned ([Grant-2007]).

-shefB <b>

parameter b in the Sheffield solvation potential given in option -sheffield. If the option is not used a value of 0.735694 is assigned ([Grant-2007]).

-sheffield

Usage of this option will result in adding additional electrostatic term in order to mimic the solution environment according to [Grant-2007]. This term is of the form: $$f_\epsilon/(8\pi\epsilon_0)\sum_{i,j}q_iq_j/\sqrt{(ar^2 + bR_iR_j)}$$ where $$f_\epsilon=(1/\epsilon_{in}-1/\epsilon_{out})$$, $$q_i$$, $$q_j$$ are partial charges and $$R_i$$, $$R_j$$ are Van der Waals radii of atoms i and j. While less accurate than Poisson-Boltzmann, this method is very fast and offers analytic first and second derivatives. Because of the latter it is the only solvation option for entropy calculations.

-solv_dielectric <d>

Allows to change the default value for solvent dielectric constant used for Poisson-Boltzmann and Sheffield solvation energy calculations. Allowed range is 1 to 100. Default value is 80.

-solventCA <s>

This option can only be used in combination with -solventPB or -sheffield. It causes inclusion of a molecular surface solvation term (sometimes called cavity solvation term) in the total energy. The value of parameter s (microscopic surface tension coefficient) is in the range 0.005 - 0.030 kcal/(mol Å$$^2$$). This option is aliased as -solventMA, for compatibility with previous releases.

-solventPB

For optimization of small molecules in solution, the electrostatic part of molecule-solvent interactions will be calculated using Poisson-Boltzmann model.

#### Complex solvent flags (used when -protein or -complex are specified)¶

-protein_elec <m>

This option provides 4 choices for calculating protein-ligand electrostatic interaction energies: m = None, ExactCoulomb, GridCoulomb and PB . Option m = None eliminates electrostatic interactions. Values of m set at ExactCoulomb and GridCoulomb result in the usage of Coulomb exact potential and digitized on the grid respectively. Option m = PB provides a more realistic potential which accounts for solvent forces, calculated according to the Poisson-Boltzmann (PB) model at every iteration step. This option requires substantially higher CPU time, particularly for large proteins. By default m = ExactCoulomb.

#### PB parameters¶

Determines types of atomic radii used for PB calculations (options: -solventPB and -protein_elec). By default parameter type is set to Bondi. Two other choices are ZAP9 and ZAP7 ([Nicholls-2010]).

-salt <c>

Allows to all PB calculations to be performed at specified salt concentration in M, up to 0.08M. By default salt concentration is zero.

In the case when the electrostatic component of the protein-ligand interaction energy has been pre-calculated on a grid, this option forces SZYBKI to read the grid potential from the file filename, and use it for ligand optimization inside the protein. This option is available only when -protein_elec is set with GridCoulomb.

-savePG <filename>

Saves potential grid in the file named filename. Allows a significant saving in CPU time for runs when Coulomb grids are used to optimize a number of ligands inside the same protein. This option is available only when -protein_elec is set with GridCoulomb.

### Optimization¶

-optDOF

An alias to -optGeometry.

-optdof

An alias to -optGeometry.

-optGeometry <dof>

Optimization in specified degrees of freedom will be done. The possible choice for parameter dof are: cart for Cartesian coordinates, tor or torsions for torsions optimization, solid for rigid ligand optimization inside a protein receptor, none or sp for single point calculation, Honly for hydrogen atoms only optimization, and calculationDependent. The value calculationDependent which is selected by default when option is not used, sets the defaults type of degrees of freedom: Free ligands by default are optimized in Cartesian coordinates, while protein bound ligands in translational-rotational coordinates. The option is aliased with -optDOF and -optdof.

#### Convergence Criteria¶

Optimization is terminated when the rms gradient reaches the input value c unless is finished earlier because of other reasons. When omitted the default convergence criteria is 0.1 on gradient vector norm: $$\sqrt{\sum_i {g_ig_i} }$$, where $$g_i$$ is i-th gradient component.

-max_iter <m>

Optimization will be terminated when the number of iteration cycles reaches input number m. The default value is 1000.

-optMethod <type>

Selects optimization type. The value of type replaces the default BFGS optimizer for small molecules and conjugate gradient for systems with 500 or more degrees of freedom. Allowed values of type are: BFGS, bfgs for BFGS optimization, CG, cg, conj for conjugate gradient, SD, sd, steepest for steepest descent optimization, sd_bfgs for pre-optimization with 5 steps of steepest descent followed by BFGS optimization, sd_cg for pre-optimization with 5 steps of steepest descent followed by conjugate gradient optimization, and “newton”, “NEWTON” for Newton-Raphson optimization if analytic second derivatives are available. Entropy estimation requires BFGS or Newton-Raphson type of optimization, so all values of parameter type which represent different optimization methods are ignored for entropy runs. In addition, when quasi-Newton method of entropy estimation is requested (see -entropy), the usage of BFGS is enforced.

#### Fixing Ligand Atoms (fixed atoms won’t move during optimization)¶

-fix_file <filename>

Text file filename should contain a list of molecule names followed by atom numbers to be fixed. An example of the command line for SZYBKI run with the use of this option is given below in the subsection Examples.

-fix_smarts <file_name>

All atoms which belong to SMARTS pattern specified in a single line of the text file file_name will be fixed. For example, if the input string is [!#1] all heavy atoms will be fixed and SZYBKI run will optimize all hydrogen atom positions.

-flex_file <filename>

Text file filename should contain a list of molecule names followed by atom numbers to be optimized. All non-listed atoms will be fixed. Numbering atoms convention used is from 0 to n-1, where n is the molecule’s total number of atoms.

#### Constraining Ligand Atoms (can move but have harmonic constraint)¶

-harm_constr1 <k>

Constrained potential of the form: $$kr^2$$ will be imposed on all heavy atoms, where $$k$$ is the force constant. By default no constraint is applied ($$k = 0$$), and the upper allowed limit is 1000 kcal/(mol Å$$^2$$).

-harm_constr2 <d>

Constrained potential of the form: $$V = k(r-d)^2,r>d$$ and $$V=0, r \leq d$$ will be imposed on all heavy atoms, where d is the constraining distance in angstroms. Can be used only together with -harm_constr1. Default value is 0, while the upper allowed value is 5 Å.

-harm_smarts <file.txt>

All atoms which belong to SMARTS pattern read from file file.txt will be constrained. For example when the file file.txt contains a single line cO, input option -harm_smarts file.txt will result in constraining all aromatic carbon atoms and oxygen atoms bonded to them. Must be used in conjunction with -harm_constr1.

#### Constraining Torsion Angles¶

-tor_constr <fn>

File name containing a single line which determines the torsion to be constrained, reference torsion angle and a force constant. The constraining potential is in the form: $$V=k_c(cos(phi) - cos(phi0))^2$$, where $$k_c$$ is the user specified force constant and $$phi0$$ is the reference torsion dihedral angle. The input data in the file fn should be in the following order: SMARTS $$phi0$$ $$k_c$$. SMARTS might be replaced with atom indices: $$i_1$$ $$i_2$$ $$i_3$$ $$i_4$$ $$phi0$$ $$k_c$$.

Examples of valid inputs are: [C:1][N:2][c:3][s:4]  0.0  2.0 5 0 10 25 0.0 2.0 Notice that atom indices are numbered from 0 to $$N-1$$, where $$N$$ is the number of atoms in the molecule.

#### Protein Flexibility¶

-flexdist <d>

Specifies distance d from the ligand which determines flexible residues a protein receptor in the optimization of a ligand inside the protein binding side. Has to be used together with -flextype

-flexlist <fn>

Similar to flexdist but instead of using distance from the ligand as partial protein flexibility criteria, provides a file name fn, containing flexible residues. Every line of this file should contain a pdb residue name, residue number and chain id. For example a 2 line file:

ILE 78 A
Phe 114 B

informs SZYBKI that Ile78 of chain A and Phe114 from chain B will be flexible during ligand optimization. Has to be used together with -flextype.

-flextype <type>

Allows to specify the type of partial protein flexibility. Possible values of parameter type is a string polarH, sideC or residue, and their respective meanings are: polar hydrogens, side chains and complete residues. Has to be used together with flexdist or flexlist. The molecular potential used for optimization consists of three components:

1. Ligand intra-molecular MMFF potential

2. Protein-ligand potential

2.1. MMFF terms for interaction between polar protein hydrogens and the ligand

2.2. Interaction between fixed protein atoms and the ligand

3. Protein intra-molecular potential

3.1. MMFF terms involving polar protein hydrogens and atoms up to three bonds apart.

3.2. Interaction between the rest of the protein and flexible polar hydrogens

The sum of components 2.2 and 3.2 might be called “protein-pseudoligand” interaction and is evaluated according to the model selected with the -protein_elec option above.

### Entropy calculation¶

-entropy <type>

Estimation of the entropy of a ligand will be done. Parameter type can take three values “None” (default), “QN” (Quasi-Newton Hessian) and “AN” (analytical). Input ligand is assumed to be in the form of a multiconformation molecule. The environment is determined by the option -sheffield (solution), -protein (in the binding site) or none of those two indicating a gas-phase.

-t

Sets the temperature (in C) of the system for entropy estimation. Default temperature is 25C (298K).

### Deprecated¶

-MMFF94S

Modified set of MMFF94 parameters developed in 1999 by Halgren ([Halgren-1999-1], [Halgren-1999-2]) will be used. This option is overwritten by -ff when both are used simultaneously.

-polarH <r>

This option allows partial flexibility of a protein receptor in the optimization of a ligand inside the protein binding side. All polar protein hydrogens distanced by r from the ligand will adjust their position upon energy relaxation. Main chain amide NH hydrogens of peptide groups are not included. This option is deprecated. Please use flextype with polarH as parameter, together with flexdist.

-residue <r>

Similar to polarH. This option allows for flexibility of complete residues having at least one atom within distance r of any atom of the ligand. There is no upper value of the r parameter, however using a large value will cause even distant residues to be flexible which could lead to poor results. The intention of this option is to allow only residue atoms interacting directly with the ligand to rearrange upon optimization. This option is deprecated. Please use flextype with residue as parameter together with flexdist.

-sideC <r>

Similar to polarH. This option allows for flexibility of all side chains having at least one atom within distance r of any atom of the ligand. Same comment as in the -residue above applies with respect to side chains atoms. This option is deprecated. Please use flextype with sideC as parameter together with flexdist.

-rws

Removes explicit water molecules from solvent accessible surface calculation during the evaluation of the bound ligand entropy. By default explicit water molecules present in the protein binding site are treated as part of the protein receptor.

-sfp

Change the value of macroscopic surface tension coefficient from its default value of 6.0 cal/(mol Å$$^2$$). This quantity is used for the calculation of the bound ligand entropy.

## Examples¶

This section has several examples of typical SZYBKI command-line executions.

### Protein preparation¶

X-ray protein structures rarely contain all hydrogen atoms coordinates. It is necessary therefore to prepare the protein structure before any protein-bound ligand calculations are done.

prompt> szybki -optGeometry Honly -prefix ex1 -neglect_frozen protein.pdb proteinH.mol2


In this command the input protein file protein.pdb is assumed to contain all heavy atoms. Option -optGeometry followed by the string Honly fixes all heavy atoms at their positions given in the input file. For larger proteins it is recommended to increase the total number of optimizer iterations beyond the default value of 1000 using the option -max_iter. It might be particularly important if a tight convergence of 1e-6 or smaller is required (see -grad_conv). The output protein file is proteinH.mol2. Note: All ionizable residues are assumed to be in their standard ionization states. If other than standard ionization states are needed, the input file has to be edited accordingly before the SZYBKI run is done.

### Ligand optimization in solution¶

The equilibrium geometry of a molecule in solution is sometimes drastically different than in vacuum. This is because of solvent forces. The most efficient way to optimize a large set of ligands in solution is the usage of the -sheffield option.

prompt> szybki -sheffield -am1bcc -prefix ex2 ligands.sdf ligands_optimized.sdf


In the run above, a set of molecules from the input file ligands.sdf is optimized with the MMFF94 force field in solution. AM1BCC charges are optionally used for the Sheffield solvation model. Skipping the -am1bcc would result in the usage of MMFF94 atomic charges. Optimized structures are written to the ligands_optimized.sdf file. One can also optimize the ligand in solution using a physically more rigorous model by solving the Poisson-Boltzmann equation at every step. The command line below is equivalent to using the widely known PBSA model:

prompt> szybki -solventPB -solventCA 0.005 -prefix ex3 ligands.sdf ligands_optimizedPB.sdf


### Optimization with fixed atoms¶

This example illustrates the usage of the option -fix_file.

prompt> szybki -fix_file f.txt -prefix ex4 mymolecule.mol2 molecule_opt.sdf


Text file f.txt contains molecules names followed by atom numbers in the (0,1...N-1) numbering convention. For example the f.txt file contains the following 6 lines:

Molecule1
0
15
Molecule2
5
9

informs SZYBKI that atom 0 and 15 for molecule Molecule1 and atoms 5 and 21 for molecule Molecule2 should be fixed during optimization. This flag is useful when the user wants to fix atoms which are not directly bonded, for example they belong to different functional groups in the optimized molecule. In the case a common substructure has to be fixed, an option -fix_smarts is more convenient to use:

prompt> szybki -fix_smarts fix_smarts.txt -prefix ex4a mymolecule.mol2 molecule_opt1.sdf


where file fix_smarts.txt contains a single line with the SMARTS pattern c1ccccc1 which defines the substructure to be fixed.

### Ligand optimization in the active site¶

We need to distinguish between two cases. One is screening of a large amount of docked ligands into a protein receptor, and the second is lead optimization. In the first case instead of cpu expensive optimization with the use of PB solvent forces, we recommend a two step SZYBKI calculation:

prompt> szybki -p p.mol2 -in lig.oeb -out optlig.oeb -optGeometry cart -prefix ex5
prompt> szybki -p p.mol2 -in optlig.oeb -optGeometry sp -protein_elec PB -prefix ex6 -log lig


where p.mol2 is the protein file, lig.oeb and optlig.oeb are input and optimized output ligand files respectively, possibly with many conformations and poses. In the first run the input docked ligands in lig.oeb are minimized in the (Coulomb + VdW) MMFF94 potential in full Cartesian coordinates. In the second run, a single point calculation (option -optGeometry) is done with PB protein-ligand electrostatics, which calculates protein-ligand interaction energy including solvent effects. The result of the calculation is stored in the lig.log file. The fragment of the log file showing protein-ligand interaction results is shown below. The entry marked Lig-Protein Interaction Energy is the sum of all lines starting from double underscore. All values are in kcal/mol.

Overall Ligand-Protein Interaction terms:
__VdW                              16.46494
__Coulomb diel=1.0               -429.64957
__Protein desolv    (PB)           46.85247
__Ligand desolv     (PB)           67.42229
__Solvent screening (PB)          276.92111
----------
Overall Lig-Prot Interaction      -21.98875


All term names but “__Solvent screening (PB)” are self-explanatory. Solvent screening is the reduction in Coulomb interaction between charged atoms due to the presence of water polarization effect which partially nullifies each charge. Its sign is always opposite to the Coulomb interaction which is screened: it makes attractive interaction less favorable and reduces repulsive interaction. In the example above, the presence of water decreases Coulomb interaction of -429.6 kcal/mol to -152.7 kcal/mol (-429.6 + 276.9 = -152.7 kcal/mol).

In the case of lead optimization, a complete optimization which includes solvent forces and possibly partial relaxation of the protein residues in the direct proximity to the ligand is recommended:

prompt> szybki -p p.mol2 -in lig.oeb -out optligPB.oeb -flextype residue -flexdist 2 -prefix ex7
-protein_elec PB -optGeometry cart -log ligPB


where 2 is the distance from the ligand (in Angstroms).

### Ligand entropy calculation in solution¶

Assuming that the molecule conformations are given in the molecule.oeb file, the command line execution:

prompt> szybki -entropy AN -sheffield -prefix ex8  molecule.oeb


will result in the standard molar entropy estimation of the input molecule in solution. The meaning of the parameter value AN is explained in the description of the option -entropy. Option -sheffield informs SZYBKI that the calculation will be done in solution with the use of the Sheffield Solvation Model. The entropy results will be written in the ex8.log file. If the input file molecule.oeb does not contain the complete ensemble of conformations, the resultant entropy value may not be accurate. We recommend therefore that the input conformations are generated with the OpenEye tool OMEGA, with the -rms option set at 0.1 value, and -maxconfs set to at least 1000.

### Protein-bound Ligand entropy calculation¶

Run:

prompt> szybki -p protein.mol2 -entropy AN -prefix ex9 ligand.mol2


will generate the estimation of ligand entropy in the active site of the protein. Input file ligand.mol2 might be a docked structure or preferentially the X-ray structure.

### Binding entropy calculation¶

A workflow for estimating binding entropy is shown in the figure below:

Binding Entropy Estimation

A basic workflow for binding entropy estimation.

As Figure 4.1 shows, estimating binding entropy, $$\Delta S_b$$, requires two separate SZYBKI runs: one for the estimation of ligand entropy in solution ($$S_{sol}$$) and another one which calculates protein-bound ligand entropy ($$S_{bnd}$$). Starting from the SZYBKI 1.7.0 release it is possible to calculate binding entropy ($$\Delta S_b$$) in a single SZYBKI run:

prompt> szybki -complex complex.oeb -entropy AN  -prefix ex10 -sheffield -in ligand.oeb


where complex.oeb is a molecular file of the protein-ligand complex and ligand.oeb contains an ensemble of conformations of the ligand in solution. Note: The docked ligand in the complex.oeb file must represent a single pose and must be the same molecule as in the ligand.oeb input file. In the case where two or more docked poses are used, calculation of binding entropy requires two separate runs as shown on Figure 4.1.