This is the main module of
ARP/wARP, which provides execution of the following tasks:
(a) automated model building
starting from experimental phases
(b) automated model building starting from existing
model
(c) improvement of maps by atoms update and
refinement
(d) building of solvent atoms
Applications (a) and (b) (so called
warpNtrace
protocol) start with input experimental / density modified phases or available
(preliminary refined or partially autotraced) model and are aimed to deliver an
essentially complete model and obviously an improved map. The software used by
these applications has considerably advanced from the previous 6.0 version so
that the task now converges faster, may be applicable to lower resolution of
the X-ray data and may tolerate poorer starting phases. As a rule of thumb, the
resolution of the data should be 2.7 or higher.
warpNtrace protocol utilises the idea of the hybrid model in which protein and free atoms can co-exist. warpNtrace keeps whatever was recognised as protein
(in a form of polypeptide fragments) and the rest as free atoms and refines
this hybrid
model during a 'big' cycle, consisting of several (typically 5) ARP/REFMAC
update/refinement cycles. At the end of each big cycle the map is interpreted
anew and this is expected to provide a better interpretation (more residues in
less fragments). This whole procedure is iterated (typically 10 times).
The output of warpNtrace is a set of refined polypeptides
fragments. If the sequence is available, the traced fragments will be docked in
sequence and side chains will be built during the iterative refinement
procedure. After the last building cycle the fragments will be arranged to form
a globular structure (or, for a case of NCS, several NCS-related structures).
The remainder of the structure (cis-prolines, poorly ordered loops and terminal residues
for each fragment) will have to be completed by the user. Since the output
model is refined, its accuracy is expectedc to be comparable to the one of the
final refined structure. Mis-tracing (incorrect tracing of polypeptide
fragments) is not impossible but should not normally exceed 1 % of the whole
structure (this is very much subject to the resolution and quality of the data,
quality of starting phases and the level of convergence of the warpNtrace task).
Application (c) has not changed
since the previous 6.0 release. It can be used if warpNtrace was unsuccessful and may provide
improvement in density map. The map is first interpreted as a pseudo protein
model, consisted of unconnected free atoms (similar to the map interpretation
in application (a)). This model is then refined and updated with iterative
cycles of ARP/REFMAC. However, no autotracing (interpretation of the map in
terms of polypeptide fragments as in warpNtrace) is carried out.
Application (d) for building a
solvent structure into a model where the protein part is complete has also not
changed since the previous 6.0 release. Within this task restrained reciprocal
space refinement is carried out with REFMAC while ARP/wARP is performing
automatic adjustment of the solvent structure. Resolution of the data should be
2.5 or higher. The output is the protein model with the solvent molecules
transformed with symmetry operations to lie around the protein.
Below is the application (a) is
described in detail, input to applications (b), (c) and (d) is very similar and
should be obvious.
o
Run ARP/wARP for
Choose applications (a) to (d) as described above.
o
Dock the autotraced chains to sequence The
default is to dock the fragments starting from building cycle 0. The cycle
number can be changed, although this should not be advantageous. Should the
sequence not be available, the docking can be disabled by clicking on the check
box on the left.
o
MTZ in X-ray data in the MTZ
format containing structure factor amplitudes, their standard deviations,
phases and figures of merit. If pre-weighted structure factor amplitudes (e.g.
from SHARP) are to be used to construct initial map, please check the
corresponding box in ARP/wARP flow parameters (see
below).
o
Fobs Sigma PHIB FOM If
the MTZ column labels for structure factor amplitudes, their standard
deviations, phases and figures of merit have obvious names, they will be
recognised automatically. Otherwise please use the scrolling button, navigate
to List All Labels and chose appropriate ones.
o
Sequence file in
Provide the sequence file in the following format (pir):
The first line should start with >
The second line should be blank
The sequence (1 letter code) starts from the third
line. The spaces hereafter are ignored.
o
Total residues in the AU / number of molecules For
monomers provide the total number of residues in the asymmetric unit, the
number of molecules is obviously 1. In a case of NCS, please also provide the
total (!) number of residues in the asymmetric unit and the number of NCS
related molecules (e.g. if you have 2 molecules in the AU with 200 residues
each, enter 400 for the number of residues). If you have a heteromer, e.g. 3a/3b
structure, the NCS order is 3 but please make sure that the sequence file
contains both sequences separated by about 20 alanines:
SEQUENCE_OF_a_SUBUNIT_AAAAAAAAAAAAAAAAAAAA_SEQUENCE_OF_b_SUBUNIT
o
Cycles of autobuilding / total cycles The default is 10 big building
cycles separated with 5 ARP/REFMAC cycles (thus making 50 cycles in total). In
cases of good starting phases the autobuilding may converge faster, in cases of
poorer phases more cycles may be required. You can always submit warpNtrace for further cycles using the
output of the previous tracing (application automated model building
starting from existing model).
o
Protocol for REFMAC5 / Rfree The fast and slow protocols differ
in the number of internal Refmac cycles and the dumping factors. The type of
the protocol will be set automatically judging from the resolution of the X-ray
data. Usually there is no need to change it. For warpNtrace task the default is
to not use Rfree, since the number of traced residues serves as excellent
indicator of the success of the job. You can turn the use of Rfree on but the
authors have seen marginal cases (low resolution and hence low
observation-to-parameter ratio) when this adversely affected the tracing.
There is a number of additional parameters that you
normally should not worry about. Brief description is given below
o
Pre-weighted Fobs for initial map calculation (e.g. from SHARP). Checking this box will result in a
pool-down menu asking for FBEST
label.
o
Number of ARP/REFMAC refinement cycles between
autobuilding
The default is 5 cycles. In cases of poor convergence you can try to increase
this number to 10.
o
Skip the autobuilding for the first cycles Checking this box will disable the
autotracing for the provided number of cycles. This was sometimes advantageous
with previous version 6.0 when the initial phases were poor. The
default is to start autotracing from cycle 0.
o
Randomisation of atomic positions This was sometimes advantageous
with previous version 6.0 when the initial bias was high. The
default is not to randomise.
o
Truncate excessive shifts This is a leftover from earlier
version, ignore this parameter.
o
Removal of protein atoms of traced model During the ARP/REFMAC cycles in
between the tracing, the hybrid model is updated. If you would like to keep
track on what part of traced fragments has been removed during the update, then
check the box. This option is provided primarily for developers only.
o
Iterate the tracing Each main chain tracing is carried
out in several iterations. The module will decide on its own how
many iterations is needed. The default maximum number is 5 and it is NOT
recommended to change this value.
o
Density thresholds for atom removal and addition These parameters are defined
automatically on the basis of the resolution of X-ray data. In cases of poor
convergence, particularly when the number of both added and removed atoms is
considerably less than the number requested (as can be seen from the log
file), the threshold for atoms removal can be slightly increased. This option
is provided primarily for developers only.
o
Increase in the number of atoms to be added and
removed as compared to the automatically set values The default is 1 (no increase) and
it is not recommended to change this parameters. This option is provided
primarily for developers only.
o
Cycles of refinement in each Refmac run Refmac is invoked to refine the
hybrid model before the density maps are computed. The default is 1 cycle for
the fast
protocol and 3 cycles for the slow protocol, see above. There is usually no need to
change these parameters.
o
Damp shifts The default is 0.99. There is
usually no need to change these parameters.
o
Matrix weight for Xray / Geometry The default is automatic weighting. This proved to work
well and, probably, there is no need to change this parameter.
o
Scaling model The default is to use solvent
correction for scaling low angle part of the X-ray data. You can turn this off
(chose simple
solvent correct) if your low angle data are missing (e.g. your data have about
8 low resolution cutoff) or they suffer from missing overloaded reflections. XXX
check XXX
o
Scaling B factor The default is to use anisotropic B factor for scaling the X-ray
data. You can turn this off (chose isotropic scaling B factor) if your data are
systematically incomplete (e.g. a cone is missing in reciprocal space).
o
Data with free R label This parameter appears if the free
R flag is chosen for refinement of the protein part of the model. Here you can
provide a column label for the free R flag.
o
Use of free R reflections This parameter appears if the free
R flag is chosen for refinement of the protein part of the model. The scaling
and calculation of sA coefficients by Refmac
map can be computed on the bases of the free reflections (this is the default)
or using all reflections.
o
Solvent mask correction The default is to use solvent mask
correction within Refmac.
o
TLS refinement The default is not to do a TLS
refinement of a hybrid model.
o
Space group This is derived automatically from
the MTZ file, is displayed for information only and cannot be changed.
o
Cell This is derived automatically from the MTZ file, is
displayed for information only and cannot be changed.
o
Wilson B factor This is derived automatically from
the MTZ file, is displayed for information only and cannot be changed.
o
Solvent content This is derived automatically from
the MTZ file, is displayed for information only and cannot be changed. However,
you may want to check this number whether it conforms to your expectations.
o
Resolution By default all reflections present in the MTZ
file will be used. You can check the box and then narrow the range if you are
aware of certain deficiencies of your data.
o
Checking
in this button will activate remote submission. This is described below
in a separate chapter of this document.
o
Had to go as low as XXX sigma to complete atoms
search The
initial free-atoms model is built into the starting density map. The density
threshold is successively reduced. A typical value that you can see in the log
file is between 0.3 and 0.6 sigma. A lower value may be an indication of
too-much flattened map or an overestimation of the number of residues in the
asymmetric unit. If you suspect the latter, please check the derived solvent
content in the GUI window.
o
Building
cycle zero
Normally one should expect a considerable part of the structure to be built
already at the starting building cycle zero. If this is not the case, observe
the situation for a few further building cycles. If, however, there is
essentially nothing autotraced for 10 building cycles, please inspect whether
the initial phases are sufficiently good.
o
Rounds
within building cycle As was mentioned above, each cycle of the the main chain tracing is
carried out in several rounds. Normally each successive round should result in
more residues and in fewer fragments. The maximum length of the traced fragment
is also printed for information.
o
Chains,
residues and connectivity index The output from the best tracing round is processed
further. Terminal residues are removed and the fragments of 3 peptides or
shorter are converted back to free atoms. The rest is kept and used to provide
restraints for subsequent ARP/REFMAC cycles. The value of the connectivity
index should increase steadily if the tracing is successful. A value below 0.6
is not very promising. A value around 0.8 indicates a good progress. A value
above 0.95 indicates an essentially complete tracing.
o
Residues
docked into sequence If the sequence was provided, the autotraced fragments are docked into
it and the side chains are built and refined in real space. The results of this
are printed out.
o
R
factor from Refmac The value of the R factor typically oscillates. It goes up after each
tracing cycle (because the model is entirely rebuilt) and then decreases during
the ARP/REFMAC refinement cycles. At the end of the procedure it should reach a
value typical for a restrained refinement.
o
Sequence
coverage This
is defined as the ratio between the number of docked residues (if sequence is
provided) and the total number of traced residues. . A value higher than 0.8 is
deemed as good convergence. All free (dummy) atoms are removed from the file
and the task moves into a few cycles of restrained refinement with solvent search.
If, however, the value of sequence coverage is lower than 0.8, the free atoms
are left in the file. You can inspect the density maps, start changing the
model on the graphics or, alternatively, submit another warpNtrace task using the output of this job.
o
CPU requirements Execution of the autotracing task
is time consuming. Using a standard protocol of 10 building cycles interspaced
with 5 ARP/REFMAC cycles, one should expect a job for a structure of 200
residues to be completed within 1 hour (subject to the power of the computer
you are using).
o
Job termination
The
statement Task completed successfully indicates that the job is finished with no error. An
error statement
QUITTING
PROGRAM TO BLAME: name_of_the_programme
indicated
that one of the modules of the task has terminated with an error message. You
will also be referred to the specific log file.
The script file auto_tracing.sh in the $warpbin directory allows one to run the
automated model building from a command line without the use of the GUI. The
use of auto_tracing.sh is fairly simple. If invoked without arguments the script will print
help information.
Required keywords are: datafile (followed by the mtz-file name
with the absolute path) and residues (followed by the number of residues).
Optional keywords include: workdir (followed by the absolute path to
the working directory), fp (followed by the fp label), sigfp (followed by the sigfp label), freelabin (followed by the Rfree label), fbest (followed by the label for the
fom-weighted structure factor amplitudes to be used for initial map
calculation), phibest (followed by the best phi label), fom (followed by the figure of merit
label), modelin
(followed by a starting pdb-file with the absolute path), seqin (followed by a sequence-file name
with the absolute path), cgr (followed by a number of NSC-related copies), cycles (followed by the total number of
cycles) and albe
(followed by 1 if building of secondary structural elements is to be invoked
before every model building cycle).
Example call (assumed to be started from workdir where test data
should reside):
auto_tracing.sh
\
datafile
{mtzfile} \
residues
{number_of_residues_in_AU}
\
[workdir
{FULLPATH_WORKING_DIRECTORY}]
\
[fp {fp_label}]
[sigfp {sigfp_label}] [freelabin {freer_label}] \
[fbest {weighted_amplitude_label}]
[phib {phib_label}] [fom {fom_label}]
\
[modelin
{input_PDB_file_to_use_as_initial_model}]
\
[seqin
{sequence_file_for_one_NCS_copy}]
\
[cgr {number_of_NCS_copies
(if seqin is provided, default is 1) }] \
[cycles
{the_total_number_of_cycles (default is 50) }]
\
[albe {1
to_always_invoke_albe, default is 0 for resol < 2.7A, else 1) }] \
[parfile
{parfilename_if_only_parfile_is_to_be_created}]
\
The script will then create a directory in
the workdir whose name will be printed and where a parameter file will be created.
The log files and additional output files as well as the building results can
be found in the directory created by auto_tracing.sh.
This option offers you the following possibilities:
a) Your task will run using external computational
facilities, where the CPU performance may be superior to your local installation.
b) You can be assured that the most recent working
executables will be used should you have a problem with your local
installation.
c) Should the task crash, an automatic notification will
be forwarded to the ARP/wARP developers who can then promptly help you.
d) You can share the results of the completed task
with other software developers
Clicking on the button with "Submit the job for
remote execution at the Hamburg cluster" within the main ARP/wARP GUI
panel allows one to execute an autotracing task remotely. The panel will
expands and ask for an email address to be provided. Then choose from one of
the options from the drop down menu to indicate how you would like your data to
be handled. The options are:
a) the data must be kept confidential
and deleted after the job has finished
b) the data can be made available to
ARP/wARP developers
c) the data can be archived and made
available to SPINE and BIOXHIT partners
d) the data can be archived and made
available to any software developer that requests them
Needless to say, that the users will make
an important contribution towards future software development if they decide to
share their data and results of the autotracing job. Option (b) will only allow
the data share to the ARP/wARP development team. Option (c) will further extend
the share to any software developer world-wide.
Once the job has been submitted for remote
execution (but not yet launched !), the GUI window will indicate that the job
has finished. Please inspect the log file from the drop down menu option
"View files from job" for further instructions. An email will be sent
to you at the email address that you entered in the GUI window. Please follow
the instructions in the email (http link, login and password) to actually
launch the job at the Hamburg cluster. You can then monitor the log file in
your browser window. As soon as the job is finished, you will be provided with
a link to the results that you can then download.
Keep in mind that once the job is
finished, your data will be kept for only a week. Make sure that you download
your data within that time.
The remote job submission relies on the curl
software installed at your site. Availability of curl is
checked while installing ARP/wARP and a warning (and http link) are given if curl is
not available.