CCL: How can I do cluster analysis or similarity analysis



 Sent to CCL by: Thomas Cheatham [tec3-#-utah.edu]
 >  I did a molecular dynamics simulation with AMBER, and I saved a
 > thousand conformations during this run. I wish to do some cluster
 > analysis, or similarity analysis on the conformations I saved. I think
 > all the conformations can be groups into two main groups, because I see
 > the structure changes from one conformation to the other in the MD
 > simulation.
 AMBER has an active mailing list and an archive at http://ambermd.org
 which would be a good place to search/ask about MD simulations with AMBER.
 With the freely available AmberTools suite of programs are trajectory
 analysis capabilities to do clustering.  I am most familiar with ptraj
 which can cluster based on RMSd, distance-matrix, dihedrals, etc.
 Routinely we cluster based on RMSd.  There are many options, but a basic
 script to ptraj would be something like:
   trajin traj.strip
   cluster out clusters/c10 all none representative pdb average pdb \
     averagelinkage sieve 250 clusters 10 rms
 In your case, you can cluster all frames (sieve 1) and to decide on the
 number of clusters I would visualize a 2D RMSd plot since 1D-RMSd plots
 can be deceptive with respect to cluster count.
 See the manuals at http://ambermd.org for more information.  A paper
 describing that implementation is Shao et al. (Cheatham), JCTC ~2007.
 Alternatives include MMTSB, tools distributed with GROMACS and
 GROMOS, and likely things builtin to NAMD and CHARMM.
 If you get stuck, e-mail me off-list.
 --tec3