|
Hingefind -
a novel algorithm to investigate domain motions in proteins.
Version 6-22-95
This software is copyrighted, (c) 1995, by Willy Wriggers under the
terms of the legal statement in the distribution.
Available by anonymous ftp to lisboa.ks.uiuc.edu in the directory
pub/wriggers/hingefind.
---
Documentation:
1. General remarks
2. Files and shellscripts
3. A brief description of the algorithm
4. Output files
5. Accuracy check and other useful info
6. Correspondence
---
1. General remarks
'Hingefind' is an algorithm for the identification of domain movements
and their characterization and visualization by hinge points and
rotation axes. The method is implemented in X-PLOR 3.1 script language
(Axel T. Brunger, 1992). The output psf and pdb files can be
visualized with standard graphics packages, e.g. the graphics program
VMD of the Theoretical Biophysics Group, Beckman Institute, UIUC,
(available from ftp.ks.uiuc.edu in pub/mdscope/vmd, or
http://www.ks.uiuc.edu:1250/Research/vmd) or with Quanta (MSI). It
compares two known structures (e.g. two different crystal structures
of a protein or the results of molecular dynamics simulations) and
partitions the protein with a prespecified resolution in preserved
subdomains. It then determines effective hingeaxes which
characterize the domain movements with respect to the reference
domain. Both parts of the algorithm can be used alone, i.e. one can
assign domains manually and let the algorithm determine the
effective hingeregions between the domains, or one can use the algorithm to
partition a protein into preserved subdomains. The method does not
require any previous knowledge about functionally relevant domains or
hinge motions, however a critical analysis of the results is
recommended. The output files provide information about the accuracy
of the found hinge-rotation. The variety of options and resolutions
allow to find an optimal partitioning. The user can assess the
validity of the proposed movement and change the script if
necessary.
Warning: In some cases the rotational fit may be inaccurate or there may
be no uniform domain motions.
2. Files and shellscripts
There should be several files and scripts to set up the algorithm:
hingefind
A unix shell script that runs the X-PLOR job and writes three
X-PLOR stream files which contain commands from which X-PLOR
can compute filenames and the resolution of the algorithm.
partition.str
A stream file which contains necessary X-PLOR commands to set
up the structure. It may contain a pointer to a psf file. It
is recommended to use segid "AP0" for the protein, otherwise
hingefind.inp has to be modified. Note that the coordinates
in the two compared pdb files must be both compatible with the
structure. The pdb files may contain additional atoms which do
not have to be specified in partition.str if not used in the
partitioning.
dum.top
A X-PLOR topology file with the residues of dummy molecules
used in the algorithm for visualization of hingepoints and
axes.
prexplor.dim
The X-PLOR file which contains array sizes for compilation
(35,000 atom version). It will probably be necessary to
compile X-PLOR with the larger BUFMAX parameter for the loops.
This executable is named "xl" in the hingefind script.
hingefind.inp
The X-PLOR script with the algorithm. There are a variety of
variables and paths the user has to specify in the head of the
file :
$ndomains:The number of domains to be found.
Recommended: 2 - 5, depending on the resolution.
$maxccounter:The number of maximum cycles of the
"converge" loop. In case the algorithm does not converge
within the specified number of cycles (this was very rarely
observed to occur in the "fas" partitioning mode at extreme
resolutions), a warning message is writen in the log file.
Recommended: 10 - 20.
$assign: This variable determines the mode of the partitioning part
of the script: "man" specifies manual assignment of domains,
no partitioning. Up to 9 domains can be assigned below and
$ndomains must be smaller than 10. "fas" codes for the fast
version of the automatic partitioning algorithm, in which the
connectivity of the residues in the found domains is NOT
maintained. "slo" specifies the slow partitioning algorithm
with maintained connectivity of the domains.
$nndist: The variable determines at which max distance two residues
are considered next neighbors in the "slo" mode partitioning.
store1...9: The selection attributes which allow the assignment of
up to 9 domains by hand in the "man" mode.
$case1COO: String that specifies input file for the coordinates in
pdb format or pointer to pdb file. The path has to be
specified. X-PLOR can compute the filename from the variable
$case1 defined in the streamfile casefile1 written by the
shellscript. Coordinates written to main corrdinate set.
$case2COO: String for 2nd pdb file (comparison coordinate set
Analogous to $case1COO.
$oname: Output pdb file with assigned domains, hingepoints, axes.
The filename can be computed using the $fname variable which
contains the resolution as defined by the shellscript. The
path has to be specified.
$uname: Output psf file, analogous to $oname.
$dname: Output log file with information about the proposed
hinge rotations, residues, accuracies. The filename can be
computed using the $fname variable. The path has to be
specified.
It is recommended the user tests the script with the domains of
interest assigned manually beforehand, then tries automatic "fas"
partitioning with the resolution in the shellscript set between 50 and
100 (%). Finally the partitioning should be repeated in "slo" mode for
selected cases and resolutions.
3. A brief description of the algorithm
The method will be published in the near future, please inquire about
a preprint or reference at the e-mail above. The algorithm is
separated in two parts: the "partitioning" and the "rotational fit"
section. The "partitioning" part determines domains with preserved
structure in the two compared coordinate sets, depending on a
prespecified resolution. The method uses the least-squares fit method
(W. Kabsch, 1976) as implemented in X-PLOR. A domain is found in a
iterative procedure, in which poor matching residues are excluded
from the domain and good matches are included. In the "slo" mode only
the heaviest connected set is considered, maintaining the connectivity
of the changing domain.
The "rotational fit" method attempts to locate a hingepoint and a
rotation axis which characterize the transformation of the domain
between main and comparison coordinate set as a hingerotation. The
hingepoint could be anywhere on the axis, but is determined here, by
construction, as closest point to the center of mass of the domain .
The rotation about an axis without translation, in general, will not
yield the closest fit of the Kabsch least-squares method, so the
problem is to find the least-squares solution with the constraint that
transformations are not allowed, only rotations about an unknown
hingepoint. It turns out that this constrained problem is not easy to
solve and the exact solution may be too expensive to compute, so an
approximation is used in the algorithm. The accuracy of the
approximation can be assessed by comparing the least-squares fit with
the proposed rotational fit. Note that the (rmsd) error of the fit
may be due to the error of the approximation OR to the constraint of
not allowing translations.
The construction works as follows: A Kabsch least-squares fit yields
an translation vector v of the COM and a rotation axis r with angle
alpha. The rotation axis is then projected on the bisecting plane of v
which yields a new rotation axis r' and a "projection angle" beta,
defined by r and r', from which one can compute the new rotation angle
alpha' = alpha * cos(beta). Using this projected rotation, one can
construct a hingepoint on the bisecting plane. A rotation with r' and
angle alpha' about the hingepoint then transforms the COM of the main
set on the COM of the comp set. So the projection maintains (relative
to the least-squares method) the removal of the COM difference between
the sets, but approximates the rotation. The idea is that in
hingebending motions there should be a relatively large COM separation
|v|, and the rotation r should be almost parallel to the bisecting
plane of v. Thus, in addition to the rmsd error of the fit, the
validity of the approximation can be assessed by checking the angle
beta, which should be small. One finds that the method works
best for larger domains comprising several secondary structure elements.
4. Output files
There are three output files specified by the variables $oname $uname
and $dname: pdb and psf files of the labeled structure, and the log
file of the run. The pdb and psf files can be used to visualize the
results of the algorithm: The data is labeled by segid's:
"AP0" is the unconverged rest of the protein,
"AP1" is the reference domain of the protein,
"AP2", "AP3", etc, are additional domains,
"DUM2", "DUM3", etc, are the dummy molecules which
visualize the hinge-rotation of the domains.
The dummy molecules show an arrow along the rotation axis with it's
orientation representing the right-handed rotation about the axis. The
hingepoint in the middle of the arrow is connected to the COM of the
main and comparison coordinate set of the domain to illustrate the
rotation angle. The rotation angle and other useful information about
the run, the domains, and the accuracy of the rotational fitting can
be found in the self-explanatory log file.
NOTE: The X-PLOR logfiles would contain several MBytes of data for
each run, so the standard output is piped to /dev/null. The standard
output should only be used for debugging of modified or augmented
scripts.
5. Accuracy check and other useful info
The log file contains information about the accuracy of the fitting as
outlined above in 3. The relative error (in percent) is computed as
[ RMSD (proj) / RMSD (least sq.) ] - 1.
It is recommended to run the cases within a range of resolutions
between 50 and 100%. Running a particular system with a range of
resolutions in "fas" mode, it was found that there exist one or more
windows of optimum resolution where the relative errors were very
small. Therefore it is recommended to try a range of resolutions
first with the "fas" mode, find the window(s) of small error and then
calculate selected resolutions in the window(s) in "slo" mode with a
higher number of domains. The error of the domain fitting was found
to decrease 5 times with "slo" partitioning due to the connectivity of the
domains.
Recommended reading about classification of domain movements: Gerstein et al., Biochemistry 33 (1994), 6739-6749.
How to find hingeregions: For shear-type motions (see Gerstein et al.) the found effective rotation axis will intersect the boundary between the domains in most cases and the effective hinge region can be found at the intersection. In hinge-type motions, the axis will be parallel to the interface. To find 'real' hingeresidues, it is useful to investigate the proteinbackbone at the segment interface. A hingeresidue should be close to the proposed rotation axis. For this type of movement, the proposed hingepoint may be useful to find the hinge, but it should be clear that a hinge"point" can be anywhere on the axis.
6. Correspondence
Updates and changes may of the method be neccessary once in a while. To stay informed about changes send e-mail to the author. The known users will also receive a preprint of the upcoming paper.
The author would appreciate 'bug reports' and any comments regarding the usefulness of the algorithm and strategies of usage. Please send your
correspondence to wriggers@uiuc.edu (NeXT-mail OK).
_______________________________________________
Willy R. Wriggers
Theoretical Biophysics Group
Beckman Institute
University of Illinois at Urbana-Champaign
405 North Mathews Avenue Urbana, IL 61801, USA
_______________________________________________
|