Automated Ligand Building

 

 

The ligand building procedure within ARP/wARP Version 7.0 proceeds in three steps: first it locates the binding site in the difference density map, then builds there a number of putative ligand models and, finally, selects the best model, which is geometrised and real-space fit into the density. The binding region is selected automatically by matching ligands shape-related properties to the regions of high density. The chosen region is parameterised by a sparse set of putative positions (grid nodes) for the ligand atoms. The stereochemical information and van der Waals repulsions in combination with the electron density allows one to obtain a suitable estimate of the position, orientation and conformation of the ligand. For the construction of the ligand into this sparse set two algorithms are used. One algorithm exploits the combinatorial assignment of the ligand atom identities to the grid nodes, label swap (Zwart, P.H. & Lamzin, V.S., 2004). Another algorithm maximises the overlap between the sparse set and the ligand model by a random search in conformational space. The output from both algorithms undergoes a last stage of real-space refinement before the final model is selected.

 

The accuracy of ligand building is mainly dependent on ligand size and the resolution of the X-ray data. As a rough guide, about 75% of well-ordered ligands of a size up to 20 non-hydrogen atoms should be built within r.m.s.d. of 1.0 from their correct location. For ligands that are larger in size, such success rate decreases to about 50%. With the r.m.s.d. of 1.0 or less the constructed models should be accurate enough for REFMAC5 to straightforwardly refine the protein-ligand complex. The procedure can be iterated to locate additional ligands, if any are present.

 

The ARP/wARP ligand building module requires the X-ray data (in MTZ format), the built protein without ligands (in PDB format) and a template model of the ligand to build (in PDB format).  Options include the possibility to specify the binding site and the number of starting grids, the ability to compare the run result to some reference ligand(s), and the possibility to build a ligand taken from a list of candidates ('cocktail'). In the latter case the coordinates of the ligand candidates should be concatenated into a single PDB file. The different ligands must be distinguished by their residue name (columns 28-30). ARP/wARP will automatically choose the best-matching ligand candidate and will attempt to build it at the binding site, either determined automatically or supplied by the user. However, since this feature is new, the specification of the binding site (see below) is recommended for that option.

 

 

o               MTZ in X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations.

o               Fobs Sigma If the MTZ column labels for structure factor amplitudes and their standard deviations have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose appropriate ones.

o               Protein model without ligand Provide the PDB file with coordinates of the protein only. If the file contains solvent atoms, free atoms or fragments of other ligands, please make sure that their location is not overlapping with the supposed location of the ligand.

o               Ligand molecule coordinates Stereochemical information about the ligand to be built is read in a form of a PDB file. This file should contain the ligand molecule only. The molecule can be in any conformation. However the interatomic distances, bonding angles and the chirality (if present) should correspond to the target stereochemistry of the ligand to be built. Please also check that there is atom-bonded connectivity throughout the whole target ligand molecule (i.e. you do not accidentally have several unconnected clusters of atoms).

 

 

There are a number of options that can be added either in the main GUI panel (scrolling bar Build the ligand) or under the Parameters section. You normally should not need to worry about these (except you want the ligand to be build around the known location or you would like to screen a list of candidates, ligand cocktail). A brief description is given below.

 

o               Build the ligand (Binding site location)

o               Refmac5 By default the fast protocol is used (1 cycle of refinement). If your PDB file needs considerable pre-refinement with Refmac before the difference electron density map can be computed, you can chose the slow protocol (3 cycles of refinement).

o               Free R Flag The default is not to use R-free for ligand building. You can chose to use R-free, this will cause additional options to appear within the section Refmac parameters.

o               Ligand building cycles defines the number of grid parameterisations of the binding region. The default value is 2. There is one run of each competing ligand building algorithm for each starting grid, therefore the CPU time required for building is proportional to this number of cycles. If this matters for large ligands you can set the number of ligand building cycles to 1.

 

o               Cycles of refinement in each Refmac run Refmac is invoked to refine your protein part of the structure before the difference density map is computed. The default is 1 cycle for the fast protocol and 3 cycles for the slow protocol, see above.

o               Matrix weight for Xray / Geometry The default is automatic weighting. Since the aim of the ligand building module is not to deliver a well-geometrised protein structure, there is no need to change this parameter.

o               Input a user-defined library file In case your input protein is already a protein-ligand complex then Refmac will have to refine both entities together in order to obtain a difference electron density map. If you already have a Refmac-style cif library for your already present ligand, you can input it here. Otherwise, Refmac will use its own library if it knows the ligand. If it does not, it will generate a cif file for the ligand and proceed.

 

o               Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the MTZ and the PDB files, displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations.

o               Resolution By default all reflections present in the MTZ file will be used. You can check the box (Use reflections between) and then narrow the range if you are aware of certain deficiencies of your data.

 

o               Compare with an already fitted ligand If you have the final model of the ligand in the correct orientation and would like to check the installation and the performance of the software, you can check this box. You will then have to provide a PDB file that will be used for comparison.

 

o               Refinement with refmac The R factor (and R free if requested) are printed after refinement of the protein part only with Refmac. Check that the value of the R factor is reasonable. A value of higher than about 30% may indicate that the computed difference map may be too noisy for location of the ligand. A failure may indicate invalid atom nomenclature in your PDB file.

o               The ligandbuild program The mapping of the difference density synthesis parameterised with grid points onto the ligand atoms (ligandbuild and M_ligandbuild) is run as many times as defined by the number of ligand building cycles. A failure may indicate incorrect identification of the binding site. This can be amended by defining the binding site manually prior to the run (see above).

o               Real space fit Up to 108 top constructed ligand models undergo a real-space refinement with respect to the difference density map. The best solution is output. If the test and comparison option is selected, the r.m.s.d to the reference PDB file (XYZREF) is also printed. There will be a warning given if the stereochemistry of the constructed ligand is poor. Also a warning will be given if the constructed ligand molecule has severe steric clashes, which may be a sign of an incorrect ligand building. You may want to inspect the ligand and the density and, if there is a clear part of the ligand that is disordered, try to remove it from the ligand target PDB file and to re-run the job.

o               Job termination The statement Task completed successfully indicates that the job is finished with no error. An error statement

QUITTING ARP/wARP module stopped with an error message: name_of_the_programme indicated that one of the modules of the task has terminated with an error message. Please refer to the specified log file.

o               CPU requirements The table below serves as a rough guide on the expected CPU time required for a run (subject to your machine architecture):

 

Number of atoms in the ligand

CPU

Less than 15

About a minute

20

A few minutes

30

About 5 minutes

40

About 15 minutes

 

 

Running ligand building from command line, auto_ligand.sh

The script file auto_ligand.sh in the $warpbin directory allows you to run the ligand building as a single-line command without the use of the GUI. The use of auto_ligand.sh is fairly simple. The script prints out help information if it is invoked without arguments.

Required keywords are: datafile (followed by the mtz-file name with the full path), protein (followed by the pdb-file name of the protein model without the ligand with the full path) and ligand (followed by the pdb-file containing the ligand(s) description with the full path).

Optional keywords include: workdir (followed by the full path to the working directory), fp (followed by the fp label), sigfp (followed by the sigfp label). The defaults are FP and SIGFP, respectively. Alternatively, if the mtz file contains only one column for structure factor amplitudes and only one column for their standard deviations, these will be taken. The number of ligand building cycles (default is 2) can be changed with keyword nligandcycles.  The approximate location of the binding site can be supplied by the user either by providing the pdb-file(s) of a ligand (or a just a list of atoms) located at the binding site (search_model), or by specifying the (XYZ) coordinates of a point defining the binding region using search_position and search_radius (default value for the latter is 5 ). For test purposes, the constructed ligand can be compared to known reference models (hand- or pre-fitted). The required keyword is reflist (followed by the full-path name of a text file, containing a list of pdb-files with the reference ligands and their absolute paths). A user-defined ligand library can be input using keyword extralibrary.

To build the ligand from a list of candidates ('cocktail'), the coordinates of the ligand candidates should be concatenated into one file specified by the above mentioned keyword ligand. The different ligands must be distinguished by their residue name (columns 28-30) in the concatenated pdb file. ARP/wARP will automatically choose the best-matching ligand candidate and will attempt to build it at the binding site, either determined automatically or supplied by the user. However, since this feature is new, supplying the binding site using search_model or search_position keywords is recommended.

Example call (assumed to be started from workdir where test data should reside):

$warpbin/auto_ligand.sh                                                 \

  datafile {FULLPATH_mtzfile}                                           \

  protein {FULLPATH_starting_PDB_file_without_ligand}                   \

  ligand {FULLPATH_PDB_file_with_ligand_to_fit}                         \

  [workdir {FULLPATH_WORKING_DIRECTORY}]                                \

  [fp {fp label} sigfp {sigfp label}]                                   \

  [nligandcycles {number_of_ligandbuild_cycles (default is 2)}]         \

  [search_model {FULLPATH_PDB_file_with_model_at_expected_ligand_site}] \

  [search_position {X Y Z}]                                             \

  [search_radius {radius_in_angstroms}]                                 \

  [reflist {FULLPATH_textfile_with_FULLPATHnames_of_fitted_ligands_for_comparison}]

  [extralibrary {user_defined_library_for_Refmac5}]

 

The script will then create a directory in the workdir whose name will be printed and where a parameter file will be created. The log files and additional output files as well as the building results can be found in the directory created by auto_ligand.sh.