The procedure for building secondary
structural elements in ARP/wARP Version 7.0 is based on the use of discriminant
analysis in a successive filtering scheme taking into account the geometry of
alpha-helical and beta-stranded main-chain fragments. The electron density map
is first analysed and a suitable threshold is selected. In the next step
stereochemical information on the helix and strand geometry is used; sets of
overlapping fragments are constructed and filtered based on their geometric
likelihood. All fragments that overlap at a particular location of a helix or
strand in turn undergo an ensemble averaging process to arrive at the best
estimate of CA positions. The output fragments are then regularised and the
chain direction is chosen on the basis of their fit to the density. Finally the
fragments are refined in real space.
The accuracy of the resulting model depends
on many parameters. The module should be able to build helices and strands at
resolutions as low as 4.5 . However, it may not result in complete
helical/stranded structure and it may also contain parts that are
mis-interpreted. The expected performance of the module is the correct location
of 90% of the helices and 50% of the strands on average. The procedure is
relatively fast and takes about 1 to 3 minutes for proteins of moderate size
(up to 5,000 atoms).
The secondary structure recognition module
can be used at any resolution of data, provided that they are higher than 4.5
. However the module is optimised to address lower resolution data and hard
cases where, e.g. the straight model building protocol (Classic or flex-wARP) has not been
successful. For a resolution higher than 2.6 the module will automatically
trim the resolution and Wilson B-factor of the data to approach its design
conditions.
o
MTZ
in X-ray data
in the MTZ format containing structure factor amplitudes and their standard
deviations, phases and foms.
o
Fobs
Sigma Phib FOM
If the MTZ column labels for structure factor amplitudes, their standard
deviations, phases and figures of merit have obvious names, they will be
recognised automatically. Otherwise please use the scrolling button, navigate
to List All Labels and choose appropriate ones.
o
Output
PDB file
Provide the PDB file name where the constructed secondary structure fragments
will be output to.
o
Number
of residues
Provide the expected number of residues in the asymmetric unit. This should at
least be a good guess within 20% of the true number. If the number is too low,
the model completeness will be lower. If it is much too high, this may result
in incorrect tracing and excessive CPU time.
o
Do
NOT build beta-strands If you have real doubts about your structure having a fold with a
significant content of beta-strands, you can deactivate their construction and
thus speed up the procedure by a factor of 2.
There is a number of additional parameters
that you normally should not worry about. A brief description is given below
o
Space
group, Cell,
ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the
MTZ and the PDB files, displayed for information only and cannot be changed.
However, you may want to check whether their values conform to your
expectations.
o
Resolution By default all reflections present
in the MTZ file will be used. You can check the box (Use reflections
between) and
then narrow the range if you are aware of certain deficiencies of your data.
o
Compare
with an already deposited protein for validation or testing If you have the final model and
would like to check the installation and the performance of the software, you
can check this box. You will then have to provide a PDB file that will be used
for comparison.
o
Helix
and strand recognition The important numbers are highlighted in red/bold in the short log
file, indicating the number of residues and the number of fragments into which
these residues are arranged. The higher the values of the Connectivity index and the Tracing score, the more complete and reliable
the resulting model is.
o
Further
extension of the model You may try to feed the PDB output of the module into Classic or flex-wARP. However, subject to the
resolution of the data, this may not provide enough seed for subsequent
automatic tracing of the full chain.
o
Job
termination
The statement Task completed successfully indicates that the job is finished with no error. An
error statement
QUITTING ARP/wARP module stopped with an error
message: name_of_the_programme indicated that one of the modules of the
task has terminated with an error message. Please refer to the specified log
file.
Building secondary structure from the
command line, auto_albe.sh
The script file auto_albe.sh in the $warpbin directory allows you to run the
secondary structure building as a single-line command without the use of the
GUI. The use of auto_albe.sh is fairly simple. The script prints out help
information if it is invoked without arguments.
Required keywords are: datafile (followed by the mtz-file name
with the full path), protein (followed by the pdb-file name of the protein model without the ligand
with the full path) and ligand (followed by the pdb-file containing the ligand(s)
description with the full path).
Optional keywords include: workdir (followed by the full path to the
working directory), fp (followed by the fp label), sigfp (followed by the sigfp label). The defaults are FP
and SIGFP, respectively. Alternatively, if the mtz file contains only one
column for structure factor amplitudes and only one column for their standard
deviations, these will be taken. The number of ligand building cycles (default
is 2) can be changed with keyword nligandcycles. The approximate location of the binding site can be supplied
by the user either by providing the pdb-file(s) of a ligand (or a just a list
of atoms) located at the binding site (search_model), or by specifying the (XYZ) coordinates of a point
defining the binding region using search_position and search_radius (default value for the latter is 5
). For test purposes, the constructed ligand can be compared to known
reference models (hand- or pre-fitted). The required keyword is reflist (followed by the full-path name of
a text file, containing a list of pdb-files with the reference ligands and
their absolute paths). A user-defined ligand library can be input using keyword
extralibrary.
To build the ligand from a list of candidates
('cocktail'), the coordinates of the ligand candidates should be concatenated
into one file specified by the above mentioned keyword ligand. The different ligands must be
distinguished by their residue name (columns 28-30) in the concatenated pdb
file. ARP/wARP will automatically choose the best-matching ligand candidate and
will attempt to build it at the binding site, either determined automatically
or supplied by the user. However, since this feature is new, supplying the
binding site using search_model or search_position keywords is recommended.
Example call (assumed to be started from workdir where test data
should reside):
$warpbin/auto_ligand.sh
\
datafile {FULLPATH_mtzfile}
\
protein {FULLPATH_starting_PDB_file_without_ligand}
\
ligand
{FULLPATH_PDB_file_with_ligand_to_fit}
\
[workdir {FULLPATH_WORKING_DIRECTORY}]
\
[fp {fp label} sigfp {sigfp
label}]
\
[nligandcycles {number_of_ligandbuild_cycles (default is 2)}]
\
[search_model
{FULLPATH_PDB_file_with_model_at_expected_ligand_site}] \
[search_position {X Y Z}]
\
[search_radius {radius_in_angstroms}]
\
[reflist
{FULLPATH_textfile_with_FULLPATHnames_of_fitted_ligands_for_comparison}]
[extralibrary {user_defined_library_for_Refmac5}]
The script will then create a directory in the workdir whose name will be printed and
where a parameter file will be created. The log files and additional output
files as well as the building results can be found in the directory created by auto_ligand.sh.