The ligand building procedure within
ARP/wARP Version 7.0 proceeds in three steps: first it locates the binding site
in the difference density map, then builds there a number of putative ligand
models and, finally, selects the best model, which is geometrised and
real-space fit into the density. The binding region is selected automatically
by matching ligands shape-related properties to the regions of high density.
The chosen region is parameterised by a sparse set of putative positions (grid
nodes) for the ligand atoms. The stereochemical information and van der Waals
repulsions in combination with the electron density allows one to obtain a
suitable estimate of the position, orientation and conformation of the ligand.
For the construction of the ligand into this sparse set two algorithms are
used. One algorithm exploits the combinatorial assignment of the ligand atom
identities to the grid nodes, label swap (Zwart, P.H. & Lamzin, V.S.,
2004). Another algorithm maximises the overlap between the sparse set and the
ligand model by a random search in conformational space. The output from both
algorithms undergoes a last stage of real-space refinement before the final
model is selected.
The accuracy of ligand building is mainly
dependent on ligand size and the resolution of the X-ray data. As a rough guide,
about 75% of well-ordered ligands of a size up to 20 non-hydrogen atoms should
be built within r.m.s.d. of 1.0 from their correct location. For ligands that
are larger in size, such success rate decreases to about 50%. With the
r.m.s.d. of 1.0 or less the constructed models should be accurate enough for
REFMAC5 to straightforwardly refine the protein-ligand complex. The procedure
can be iterated to locate additional ligands, if any are present.
The ARP/wARP ligand building module requires the X-ray
data (in MTZ format), the built protein without ligands (in PDB format) and a
template model of the ligand to build (in PDB format). Options include the possibility to
specify the binding site and the number of starting grids, the ability to
compare the run result to some reference ligand(s), and the possibility to
build a ligand taken from a list of candidates ('cocktail'). In the latter case
the coordinates of the ligand candidates should be concatenated into a single
PDB file. The different ligands must be distinguished by their residue name
(columns 28-30). ARP/wARP will automatically choose the best-matching ligand
candidate and will attempt to build it at the binding site, either determined
automatically or supplied by the user. However, since this feature is new, the
specification of the binding site (see below) is recommended for that option.
o
MTZ
in X-ray data in
the MTZ format containing structure factor amplitudes and their standard
deviations.
o
Fobs
Sigma If the
MTZ column labels for structure factor amplitudes and their standard deviations
have obvious names, they will be recognised automatically. Otherwise please use
the scrolling button, navigate to List All Labels and chose appropriate ones.
o
Protein
model without ligand Provide the PDB file with coordinates of the protein only. If the file
contains solvent atoms, free atoms or fragments of other ligands, please make
sure that their location is not overlapping with the supposed location of the
ligand.
o
Ligand
molecule coordinates Stereochemical information about the ligand to be built is read in a
form of a PDB file. This file should contain the ligand molecule only. The
molecule can be in any conformation. However the interatomic distances, bonding
angles and the chirality (if present) should correspond to the target
stereochemistry of the ligand to be built. Please also check that there is
atom-bonded connectivity throughout the whole target ligand molecule (i.e. you
do not accidentally have several unconnected clusters of atoms).
There are a number of options that can be added either
in the main GUI panel (scrolling bar Build the ligand) or under the Parameters section. You normally should not
need to worry about these (except you want the ligand to be build around the
known location or you would like to screen a list of candidates, ligand
cocktail). A brief description is given below.
o
Build
the ligand (Binding site location)
o
Refmac5 By default the fast protocol is
used (1 cycle of refinement). If your PDB file needs considerable
pre-refinement with Refmac before the difference electron density map can be
computed, you can chose the slow protocol (3 cycles of refinement).
o
Free
R Flag The
default is not to use R-free for ligand building. You can chose to use
R-free, this will cause additional options to appear within the section Refmac
parameters.
o
Ligand
building cycles
defines the number of grid parameterisations of the binding region. The default
value is 2. There is one run of each competing ligand building algorithm for
each starting grid, therefore the CPU time required for building is
proportional to this number of cycles. If this matters for large ligands you
can set the number of ligand building cycles to 1.
o
Cycles
of refinement in each Refmac run Refmac is invoked to refine your protein part of the
structure before the difference density map is computed. The default is 1 cycle
for the fast protocol and 3 cycles for the slow protocol, see above.
o
Matrix
weight for Xray / Geometry The default is automatic weighting. Since the aim of the ligand building
module is not to deliver a well-geometrised protein structure, there is no need
to change this parameter.
o
Input
a user-defined library file In case your input protein is already a protein-ligand complex then
Refmac will have to refine both entities together in order to obtain a difference
electron density map. If you already have a Refmac-style cif library for your
already present ligand, you can input it here. Otherwise, Refmac will use its
own library if it knows the ligand. If it does not, it will generate a cif file
for the ligand and proceed.
o
Space
group, Cell,
ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the
MTZ and the PDB files, displayed for information only and cannot be changed.
However, you may want to check whether their values conform to your
expectations.
o
Resolution By default all reflections present
in the MTZ file will be used. You can check the box (Use reflections
between) and
then narrow the range if you are aware of certain deficiencies of your data.
o
Compare
with an already fitted ligand If you have the final model of the ligand in the
correct orientation and would like to check the installation and the
performance of the software, you can check this box. You will then have to
provide a PDB file that will be used for comparison.
o
Refinement
with refmac
The R factor (and R free if requested) are printed after refinement of the
protein part only with Refmac. Check that the value of the R factor is
reasonable. A value of higher than about 30% may indicate that the computed
difference map may be too noisy for location of the ligand. A failure may
indicate invalid atom nomenclature in your PDB file.
o
The
ligandbuild program The mapping of the difference density synthesis parameterised with grid
points onto the ligand atoms (ligandbuild and M_ligandbuild) is run as many times as defined by the
number of ligand building cycles. A failure may indicate incorrect identification of
the binding site. This can be amended by defining the binding site manually
prior to the run (see above).
o
Real
space fit Up
to 108 top constructed ligand models undergo a real-space refinement with
respect to the difference density map. The best solution is output. If the test
and comparison
option is selected, the r.m.s.d to the reference PDB file (XYZREF) is also
printed. There will be a warning given if the stereochemistry of the
constructed ligand is poor. Also a warning will be given if the constructed
ligand molecule has severe steric clashes, which may be a sign of an incorrect
ligand building. You may want to inspect the ligand and the density and, if
there is a clear part of the ligand that is disordered, try to remove it from
the ligand target PDB file and to re-run the job.
o
Job
termination
The statement Task completed successfully indicates that the job is finished with no error. An
error statement
QUITTING ARP/wARP module stopped with an error
message: name_of_the_programme indicated that one of the modules of the
task has terminated with an error message. Please refer to the specified log
file.
o
CPU
requirements
The table below serves as a rough guide on the expected CPU time required for a
run (subject to your machine architecture):
|
Number
of atoms in the ligand |
CPU |
|
Less
than 15 |
About
a minute |
|
20 |
A
few minutes |
|
30 |
About
5 minutes |
|
40 |
About
15 minutes |
The script file auto_ligand.sh in the $warpbin directory allows you to run the
ligand building as a single-line command without the use of the GUI. The use of
auto_ligand.sh is fairly simple. The script prints out help information if it is
invoked without arguments.
Required keywords are: datafile (followed by the mtz-file name
with the full path), protein (followed by the pdb-file name of the protein model without the ligand
with the full path) and ligand (followed by the pdb-file containing the ligand(s)
description with the full path).
Optional keywords include: workdir (followed by the full path to the
working directory), fp (followed by the fp label), sigfp (followed by the sigfp label). The defaults are FP
and SIGFP, respectively. Alternatively, if the mtz file contains only one
column for structure factor amplitudes and only one column for their standard
deviations, these will be taken. The number of ligand building cycles (default
is 2) can be changed with keyword nligandcycles. The approximate location of the binding site can be supplied
by the user either by providing the pdb-file(s) of a ligand (or a just a list
of atoms) located at the binding site (search_model), or by specifying the (XYZ) coordinates of a point
defining the binding region using search_position and search_radius (default value for the latter is 5
). For test purposes, the constructed ligand can be compared to known
reference models (hand- or pre-fitted). The required keyword is reflist (followed by the full-path name of
a text file, containing a list of pdb-files with the reference ligands and
their absolute paths). A user-defined ligand library can be input using keyword
extralibrary.
To build the ligand from a list of candidates
('cocktail'), the coordinates of the ligand candidates should be concatenated into
one file specified by the above mentioned keyword ligand. The different ligands must be
distinguished by their residue name (columns 28-30) in the concatenated pdb
file. ARP/wARP will automatically choose the best-matching ligand candidate and
will attempt to build it at the binding site, either determined automatically
or supplied by the user. However, since this feature is new, supplying the
binding site using search_model or search_position keywords is recommended.
Example call (assumed to be started from workdir where test data
should reside):
$warpbin/auto_ligand.sh
\
datafile {FULLPATH_mtzfile}
\
protein
{FULLPATH_starting_PDB_file_without_ligand} \
ligand
{FULLPATH_PDB_file_with_ligand_to_fit}
\
[workdir {FULLPATH_WORKING_DIRECTORY}]
\
[fp {fp label} sigfp {sigfp
label}]
\
[nligandcycles {number_of_ligandbuild_cycles (default is 2)}]
\
[search_model
{FULLPATH_PDB_file_with_model_at_expected_ligand_site}] \
[search_position {X Y Z}]
\
[search_radius
{radius_in_angstroms}]
\
[reflist
{FULLPATH_textfile_with_FULLPATHnames_of_fitted_ligands_for_comparison}]
[extralibrary {user_defined_library_for_Refmac5}]
The script will then create a directory in the workdir whose name will be printed and
where a parameter file will be created. The log files and additional output
files as well as the building results can be found in the directory created by auto_ligand.sh.