Loopy is a program which tries to find likely loops to connect
fragments of a partial protein structure based on the expected
structure and the density map.
Building loops using structural and density information
Loopy builds the loops in three phases. First a tree of possible CAs
between the fragment is build, next the unlikely ones are removed
and the rest of the main chain atoms determined, and finally the
best loops are selected. The tree can be
build either towards the C-terminus of the N-terminus of the
protein, or both.
- First it tries to find likely
candidates for only the CAs of the residues. To find these likely
CAs, it takes a quatrapeptide and generates a large number of
possible positions in a shell of CA-CA distance to produce a
pentapeptide. Next it uses a likelihood table for the angles in
the resulting pentapeptides, and the density at the generated
positions to determine the set of best CAs. By iterating over
the number of missing residues, this procedure builds a tree
of possible CA-CA paths which would connect the fragments.
- The tree of possible paths is then pruned to remove unlikely
paths and keep the most likely ones. This is done in the
following steps:
- Since no restriction was placed on the end position of
the loop, the first pruning is done on the distance
between the loop and the connecting fragment. Loops are
kept, if the distance
between the end CA of the loop and the connecting CA of
the fragment is approximately equal to the CA-CA distance.
- (obsolete) Depending on the direction in which the
loop was build, the N or the C of the connecting fragment
is known. We use this information to check the
CA_fragment-CA_loop_N angle or CA_fragment-CA_loop_C
respectively.
- Though the structural likelihood is used in the
direction of loop building, no information was used on the
structural likelihood of the loop and the connecting
fragment. In this step the most likely loops according to
the structure are kept.
- If you want, you can prune the tree even further by
keeping only those loops with a high average density at
the suggested CA positions
- Next the peptide planes are determined. We use the fact,
that the atoms
between CA and CA lie in a plane, and the relative
position of N, C, and O atom are known. By rotating around
the CA-CA bond, the plane with the best density
correlation for the main
chain atoms (and worst density correlation outside the plane) is
determined. For non-GLY residues, the density correlation
at the CB is used as well.
- Finally, loops that not comply with the ramachandran
plot are removed. (We used the table as given by K. Kelly where we removed those (φ,ψ) combinations with a value 0)
- When all the loops are build (so if chosen, in both
direction), the loops are ordered (in descending order)
according to the density
correlation at the main chain atoms (including CB if present)
or the correlation of the side chains, or a combination of
both. If the number of loops exceeds the chosen number only
the best are saved to file.
- Job title
- Title for the current experiment
- Experimental data
- Select whether to use a map or an mtz file. In the case of
an mtz file, the program will use fft to compute the
corresponding map.
- Input map
- Input map to use
- MTZ
- Mtz file to use. F and PHI are used to compute the
corresponding map using fft. We need to save this file, since
we need to reread the map more than once.
- Input pdb
- Input pdb for your protein. Please, remove residues which
you would like to rebuild from this file. This frontend of
loopy will not rebuild any residues.
- Name for first loop pdb
- The name of this file is used as a format to determine the
names of the other loops to save
- Number of loops
- Select the number of loops you'd like the program to
save. It might very well be that the number of loops left
after pruning is less, than this number. In that case the
number of loops saved, will be less than you asked for. If no
loops are found at all, twiddle with the parameters,
specifically those in the folder "Selecting best CAs"
The spacegroup name and cell dimensions are extracted from the
map/mtz file.
In this folder you select which loop to build.
- N-term anchor
- Anchor residue of a fragment on the N terminus side of the
protein. Note that if you want to rebuild some
residues, you need to remove them from the pdb file
- C-term anchor
- Anchor residue of a fragment on the C terminus side of the
protein. Note that if you want to rebuild some
residues, you need to remove them from the pdb file
- Loop length
- Number of residues in the loop including the two anchor points
- Loop sequence
- Sequence of amino acids (one letter code) of the residues in the loop
including the two anchor points.
- Build both ways
- If selected (default) trees of possible loops are generated
starting from the N terminus anchor, and from the C terminus
anchor. The best loops to save are selected from the combined
set. Since the quatrapeptides from either end of the fragments
will in general differ, just as the map, starting from a
different anchor will influence the loops generated.
- Build towards C-terminus
- If you didn't select to build both ways, you can indicate
whether you want to build the tree towards the C terminus of
the protein, or towards the N terminus
In this folder you can set the thresholds used to prune the tree
from incorrect loops and the weights used to select the best loops.
- Deviation distance loop connection
- The distance between the end CA of the loop and the
connecting CA of the structure should be approximately equal
to CA-CA distance. Set the allowed error in the distance.
- Threshold density correlation CAs
- After pruning on the distance, the next step is to select
the best trees based on the density correlation of the
CAs. This number sets the number of best loops kept based on
the density correlation of the CAs only.
- Structural threshold
- You can prune on the structure of the end CA of the loop
and the connecting quatrapeptide. Set the threshold for the
minimal value for the log likelihood of this structure
- Minimum for this stage
- Set this value, if you want to ensure to keep at least a
certain number of loops after pruning on the
structure... overruling the structural threshold if
necessary
- Maximum for this stage
- Set this value, if you want to ensure that the number of
loops doesn't exceed a certain amount after structural
pruning... keeping only those with the highest structural
likelihood
- Main chain density correlation
- After pruning on the structure, the peptide planes for all
residues in the selected loops are determined. The loops are
sorted to the best density correlation of the main chain atoms
(including Cb for non-GLY). This threshold sets the number of
best loops kept
- Weight main chain
- Finally the best loops are selected by determining the density
correlation of the main chain atoms (including Cb if present)
and the correlation of the side chains. You can use this
weight to give the main chain correlation more or less impact.
- Weight side chain
- Finally the best loops are selected by determining the density
correlation of the main chain atoms (including Cb if present)
and the correlation of the side chains. You can use this
weight to give the side chain correlation more or less impact.
During the building of the tree of possible paths, shells of
CAs are generated (see top). In this folder
you can set the thresholds etc. which determine how to select the
best CAs from all the CAs in one such a shell. Note:
generated CAs with a negative density correlation will be removed immediately.
- Likelihood threshold
- This is the threshold for the log likelihood of a CA to
represent the fifth CA of a peptapeptide, based on density
correlation, CA-CA distance, and structure.
- Weight distance
- Weight for the distance likelihood
- Weight density
- Weight for the likelihood of the density correlation
- Weight structure
- Weight for the structural likelihood
- Structure table to C
- Filename for the probability table for the angles and
dihedral angles of a pentapeptide in the direction of the C terminus
- Structure table to N
- Filename for the probability table for the angles and
dihedral angles of a pentapeptide in the direction of the N terminus
- Minimum distance CA
-
- Measure for the minimal distance between CAs from the same
shell. The CA with the best likelihood is kept.
- Maximum number of CAs
- Maximum number of CAs from each shell to keep. Note:
The CAs kept will all be used as a new suggestion for the
current residue in the loop, and thus as a new node in the
tree. The number of possible loops generated will expand
exponentially with this number.
- Force minimum number of CAs
- Force a minimum number of CAs in a shell to be kept, even
if the likelihood is less than the threshold set. This makes
the loop building a bit more flexible in low density areas, or
for pentapeptide structures which occur less often.
This folder describes how the shells of CAs are generated.
- Select generation CA shell
- Default a shell with a uniform and regular distribution of
CAs at exactly CA-CA distance is generated. You can also choose for a uniform and random
distribution of the CAs. In that case the shell is generated
with a given thickness
- Number of CAs
- Number of CAs generated within a shell. In the case of a
regular distribution this number is rounded downwards to the
closest Fibonacci number.
- CA-CA distance
- Distance to use between successive CAs.
- Shell thickness
- (random shell only) Thickness of the generated
shell of CAs.
- SD CA-CA distance
- (random shell only) We assume that the
probability for the CA-CA distance is described by a
Gaussian. With this value you can set the standard deviation
fo the Gaussian function.
- Keep CAs with negative density halfway
- Due to the structure of a peptide, we expect the density
correlation halfway between successive CAs to be positive. A
quick first selection of CAs from the shell is thus (apart
from the density correlation at the generated CA) based on
the density correlation midpoint. Default for this option is false.
In this folder you can set the details of the map handling in detail.
- Interpolation method
- Choose the interpolation method used to determine the
density correlation from the map. The quick, but less accurate
option is Cubic Interpolation. More accurate, but
significantly slower is Best, which is based on a gaussian
density correlation function to similate the shape of an
atom. Note: the density correlation is determined many
times, thus the difference in speed between the options can be
huge. The side chain correlation at the end is always determined
using the gaussian version.
- Atom radius
- Radius used to determine the density correlation
- B factor
- At the moment the values for the b-factor in the pdb are
ignored. The value set, will be used for all atoms
- Remove atoms by factor
- To avoid overlaps between the generated loop and the
protein structure in the pdb, atoms in the pdb (apart from
dummies, or residues in chains consisting only of main chain
atoms) are removed from the map. This is done by flipping the
density in the map to negative values at the position of these
atoms. With this factor you can set the factor with which the
density is changed
- Density threshold residues
- Threshold for the density correlation of residues after
loop building. This is used to check overlap between the loop
and possible fragments of main chain atoms in the pdb.
- Density threshold dummies
- Threshold for the density correlation of dummies after
loop building. This is used to check overlap between the loop
and possible dummy atoms in the pdb.
Loopy writes is own logs to file. The extend of messages depends
on the levels you set in this folder.
- Message level
- Level of the messages to be written to file. (Value
from 0 till 9)
- Abort level
- If a message of this level is encountered, terminate the
program. Standard values are 7 or 8
- Message file
- Name for the message file (plain text) of Loopy.
- XML output file
- Name for the XML message file (xml format) of Loopy.
Krista Joosten
Last modified: Tue Aug 15 13:34:54 CEST 2006