CCL Home Page
Up Directory CCL moil.rtf
{\rtf1\mac\deff2 {\fonttbl{\f0\fswiss Chicago;}{\f3\fswiss Geneva;}{\f4\fmodern Monaco;}{\f20\froman Times;}{\f21\fswiss Helvetica;}{\f22\fmodern Courier;}{\f23\ftech Symbol;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;}{\stylesheet{\s242 \f22 \sbasedon0\snext0 page number;}{\f22 \sbasedon222\snext0 Normal;}}{\info}
\margl1701\margr1701\margt-851\margb-567\widowctrl\ftnbj \sectd \sbknone\pgnrestart\pgnx10900\pgny1342\linemod0\linex0\cols1\endnhere \pard\plain \qc \f22 \par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\pard {\b MOIL: A molecular dynamics program with emphasis on conformational searches and reaction path calculations\par 
}\pard \qc \par 
\par 
\par 
\pard \qj Ron Elber{\up6 1}, Adrian Roitberg,  Carlos Simmerling, Robert Goldstein, Gennady Verkhivker, Haiying Li and Alex Ulitsky\par 
\pard \qj \par 
\pard \qj University of Illinois at Chicago, Department of Chemistry, M/C 111 Science and Engineering South Bldg., 845 West Taylor St. Rm. 4500, Chicago IL 60607-7061\par 
\pard \qj and\par 
\pard \qj {\up6 1}Department of Physical Chemistry, The Fritz Haber Research Center and the Institute of Life Sciences, The Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, ISRAEL\par 
\pard \qj \par 
\par 
\pard \par 
{\b INTRODUCTION\par 
}\par 
\pard \qj The field of computational biology has expanded considerably in the last few years. Insight to the dynamics of biomolecules, the design of new drugs and the interactions that lead 
to stability of macromolecules has been obtained. Crucial in bringing these changes was the introduction of "user friendly" computer programs so that the number of potential users expanded considerably. It is now possible to visualize complex molecules and
 to study their structure and thermodynamics properties. The strength of this approach and what makes it so attractive is the possibility of studying the behavior of a variety of molecules using essentially the same set of tools. Constructing a large numbe
r of molecules was made possible by the use of a data base of molecular pieces: Different molecules are described using common fragments. For example, all proteins are constructed from the same monomers - amino acids. Another example of repeating fragments
 is found in the base-pairs of DNA. The fragment solution is chemically intuitive, however, it is approximate. In general the intramolecular interactions in an amino acid are influenced by its neighborhood. For example the charge distribution is determined
 not only by the identity of the amino acid (as is usually assumed) but also by the solvent, the nearby amino acids and the specific conformation of the fragment. Nevertheless, this approximate approach has a number of successes that are documented in the 
literature{\up6 1} and therefore not covered in this manuscript. \par 
\pard \qj \par 
\pard \qj \tab There are (at least) three questions that come to mind when such a tool is proposed. The first is the one mentioned earlier. It is related to the assumed transferability of "fragments" from molec
ule to molecule, or more generally to the way in which the interaction potential between all the particles in the system is constructed. This question was discussed in the past{\up6 1}
 and it is not the focus of the research pursued by the authors. Improvement on existing representations and the limitations of this approach are still under investigations. This question is not discussed in this chapter and we refer the interested reader 
to literature on development of empirical force fields{\up6 1}. \par 
\pard \qj \par 
\pard \qj \tab A second question i
s associated with the design of such a tool, and how the above idea of generating an arbitrary molecule from a set of fragments is translated into a computer program. We shall discuss in detail the principles of one computer program - MOIL. MOIL (MOlecular
 dynamics at ILlinois) was developed by the authors. It is of course only one example of many programs for molecular modeling and dynamics. However, the underling structure of MOIL is similar to other computer programs, such as AMBER{\up6 2}, CHARMM{\up6 
3} and GROMOS{\up6 4}. Therefore understanding the general features of one of these programs is usually sufficient. MOIL (to the best of our knowledge), is the only molecular dynamics program with the full source code in the public domain ({\b Appendix 1}
 includes instructions on how to get the code and conditions). This makes MOIL a useful starting point for developing new code, in addition to the direct applications of the program. In this chapter we provide more detailed information than is given in con
cise scientific publications
 on the structure of the code. This (we hope) will enable the readers to better appreciate the pitfalls in the present approach and will provide (for those who would like to try) a reasonable basis for writing a program of this type from scratch. \par 
\pard \qj \par 
\pard \qj \tab The third question is related to the possible applications of this tool.  Here we consider in depth two features available only in MOIL: The calculation of reaction paths in large molecular systems{\up6 5}
 and the use of the LES (Locally Enhanced Sampling) method{\up6 6} in conformational searches. This is in addition to the more "common" uses of such a program, such as energy calculation, energy minimization, molecular dynamics and free energy estimates.
\par 
\pard \qj \par 
\pard \qj \tab 
This chapter is organized as follows: In the next section we describe the basic ingredients of the program: The typical structure of the data base, how to minimize the space requirements and how to make the code faster. In the third section we consider se
veral simple applications that include the construction of a molecule, energy calculation, energy minimization and dynamics. In section four the LES approach and the technique to calculate reaction paths will be introduced.\par 
\pard \qj \par 
\pard \par 
\pard {\b THE INTERNAL STRUCTURE OF A MOLECULAR DYNAMICS CODE: THE PROGRAM MOIL}\par 
\pard \par 
\pard \qj The corner stone of any molecular dynamics program is the energy function. In MOIL the combination of the AMBER force field (covalent part){\up6 2}, OPLS parameters for non-bonded interactions{\up6 7}
, and the CHARMM force field (improper torsions) is used{\up6 3}. The functional form of the potential is "standard" and it is given by the following formula\par 
\pard \par 
\pard U({\b R}) ={\f23  S} K{\dn4 b} (b - b{\dn4 o}){\up6 2  }+ {\f23 S }K{\f23\dn4 Q}{\f23  (Q - Q}{\f23\dn4 o}){\f23\up6 2  }{\f23 + S A}{\f23\dn4  f }cos (n{\f23 f + d})   (1)\par 
\pard \tab \par 
\pard       + {\f23 S K}{\f23\dn4 f}{\dn4 i}{\f23\dn4  }({\f23 f}{\dn4 i}{\f23  - f}{\dn4 i}{\f23\dn4 o}){\up6 2} + {\f23 S }(A{\dn4 ij}/r{\dn4 ij}{\up6 12 } - B{\dn4 ij}/r{\dn4 ij}{\up6 6 } + q{\dn4 i}q{\dn4 j}/{\f23 e}r{\dn4 i }
                                         \par 
\pard \tab {\fs20\up6 \par 
}\pard {\f23             + S }v14(A{\dn4 ij}/r{\dn4 ij}{\up6 12 } - B{\dn4 ij}/r{\dn4 ij}{\up6 6})+ el14(q{\dn4 i}q{\dn4 j}/{\f23 e}r{\dn4 ij})\par 
\pard       {\fs18\up6 1-4\par 
}\par 
\pard \qj where {\b R }is the coordinate vector of all the atoms in the system, b denotes bonds, {\f23 Q} angles,  {\f23 f}{\dn4 i} the improper torsions, {\f23 f} the proper torsions and r{\dn4 ij}
 the distance between the i-th and the j-th atom (see figure 1
 for the definition of the internal coordinates). The different energy terms can be separated into covalent and non-bonded interactions. The covalent part can be further divided to two body, three body and four body terms. The two body terms include the bo
nds - covalent links between pairs of atoms. The bond energy depends only on the distance between the two atoms of interest - r{\dn4 ij} -  which is denoted in the formula by b. The three body terms are the angles: {\f23 Q}
-s. They are defined as follows: Let {\b e}{\dn4 ij} be a unit vector along the bond between the i-th and the j-th atom and let {\b e}{\dn4 jk} be a unit vector along the bond between the j-th and the k-th atoms. The angle {\f23 Q}{\dn4 ijk}
 between two bonds that share an atom is defined by the formula: cos({\f23 Q}{\dn4 ijk})={\b e}{\dn4 ij}{\b e}{\dn4 jk}, with {\f23 Q}
 in the range between 0 and 180 degrees. Note that since we do not expect large deviation from ideal values for the bonds or the angles, harmonic energy terms are usually employed. Thus, common empirical potentials that are used in biological molecules are
 not designed to describe chemical reactivity but rather conformational energy. Some developments in the chemical direction are included in MOIL in which electronic curve crossing, modeled via the Landau Zener formula{\up6 8}
, was already used to investigate the geminate recombination of nitric oxide to myoglobin{\up6 8}
. The last two terms in the covalent energy (the proper and improper torsions) are four body interactions. The difference between proper and improper torsions is in the ordering of the three bonds (figure 1).\par 
\pard \qj \par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\pard \qj {\b\fs20 Fig.1 Definition of internal coordinates for two body (bonds), three body (angles) and four body (proper and improper torsions) interactions.\par 
}\pard \qj \par 
\pard \qj  Proper torsions are defined for three sequential bonds between the four atoms i,j,k,l. Let {\b e}{\dn4 ij}, {\b e}{\dn4 jk} and {\b e}{\dn4 kl} be unit vectors along the bonds ij,jk and kl. The proper torsion - {\f23 f}
 - is the angle between two planes: the plane spanned by the two unit vectors {\b e}{\dn4 ij}, {\b e}{\dn4 jk} and the plane spanned by {\b e}{\dn4 jk} and {\b e}{\dn4 kl}. sin({\f23 f}) is given by {\b e}{\dn4 ij}({\b e}{\dn4 jk} {\f21 x} {\b e}{\dn4 kl}
), the cosine of {\f23 f} can be calculated from the normals to the planes:\par 
\pard \qj \par 
\pard \qj cos({\f23 f})=[{\b e}{\dn4 ij} {\f21 x }{\b e}{\dn4 jk}] [{\b e}{\dn4 jk} {\f21 x} {\b e}{\dn4 kl}]                       (2)\par 
\pard \qj \par 
\pard \qj \tab 
An improper torsion is also defined for four atoms linked by three bonds. However, the arrangement of the bonds is not sequential and there is one atom (say i) that is in the center and has three bonds to atoms j,k and l. For example consider atoms with s
p{\up6 2} type orbitals, (e.g., CH{\dn4 2}O) in which the carbon in the center is bonded to three different atoms. These two types of torsions are topologically distinct.
 Nevertheless, the improper torsion is computed in an identical way to the proper torsion: It is the angle between the plane defined by the three atoms i,j,k and the plane defined by j,k and l. \par 
\pard \qj \tab 
The improper torsions were introduced to preserve the planarity or the chirality of different fragments while the regular torsions aim at reproducing experimental barriers for rotations along a bond. We expect improper torsions to have only one minimum, i
n contrast to regular torsions that typically have two or thr
ee minima (rotations along bonds). It is therefore quite common to use different functional forms for the torsion energies: periodic (cosine) functions for proper torsions and harmonic terms for improper torsions. The harmonic term is advantageous for impr
oper torsions since it has a single minimum. Therefore if the molecule is at an angle far from equilibrium the corresponding energy will be high and energy minimization is likely to force the molecule in the right direction. The cosine energy term has mult
iple energy minima and it is therefore possible that wrong configurations (for improper torsions) have low energies and pass undetected. We use harmonic terms for the improper torsions in MOIL. \par 
\pard \qj \tab 
To enable convenient calculations of the covalent energy we make extensive use of pointers. The pointers make it possible to formulate the potential in a sufficiently general way such that many different molecules can be described. As an example we consid
er the bond energy. The identity of the particles linked by b
onds is stored in two integer arrays - ib1 and ib2. The number of elements of the two arrays is the number of bonds. The indices of the particles in the molecular system which are connected via the i-th bond are stored as ib1(i) and ib2(i). The coordinates
 of the first particle (for example) will be given as x(ib1(i)) y(ib1(i)) and z(ib1(i)) where the x,y and z are double precision arrays of the Cartesian components of the coordinates. To calculate the bond energy, it is also required to have easy access to
 the value of the force constant and equilibrium position. The simplest and fastest solution to this problem is to store the force constants and the equilibrium positions in arrays parallel to ib1 and ib2, e.g. kbond is a double precision array that stores
 the value of the force constants such that kbond(i) is the force constant of the i-th bond. If memory size is a problem, a more economical choice will be to store the bond parameters according to type, e.g. all N-H bonds in the protein have the same param
eters and these parameters can be stored only once. This will require however an additional pointer for each of the bonds - a pointer to the bond type. It is an almost general rule that saving in memory size by adding more pointers and making data extracti
on less direct reduces the speed of the calculation. Of course one should be careful not to overuse the memory. The situation becomes worse if the arrays become so large that the program requires the use of disk swap space. Overloading the memory is rarely
 the case with the covalent energy terms but becomes more likely when calculating the nonbonded interactions (see below). Further subtle effects may be found when secondary memory (Cache) is heavily used. This however, may be machine dependent.\par 
\pard \qj \tab 
In MOIL the arrays ib1, ib2, kbond and req have the same running index - the number of bonds - (req is the equilibrium distance). A similar approach is taken for the other covalent energies: There are three integer pointers to particle locations for the a
ngle energi
es and four such pointers for the torsions. The different energy parameters are stored according to the index of the internal degree of freedom in the same way as kbond and req. The data is stored internally in the program in a FORTRAN common block that is
 shared by a large number of programs (the modular structure of the program is discussed later) and is shown in Appendix 2.\par 
\pard \qj \tab The remaining energies correspond to electrostatic and van der Waals forces between atoms that {\b are not} covalently linked. Covalentl
y linked atoms are separated by (at most) two bonds along the polymer chain. The identity of the covalently linked atoms (which is fixed in most applications) is stored in the exclusion list. The list consists of two integer arrays, one of length of the nu
mber of atoms and one of length of the number of exclusions. The first array (exc1) is used to "point" to the last excluded particle in the second array (exc2). Thus, the particles that are "excluded" to the i-th particle are on the second array: exc2(exc1
(i-1)+1) to exc2(exc1(i)). Whenever the list of atoms that {\b interact}
 via van der Waals and electrostatic interactions is constructed we check each pair to verify that is not in the exclusion list. We also check that it is not in the 1-4 list. The 1-4 list includes the set of atoms separated by three sequential bonds. The n
on-bonded interactions of the 1-4 type are reduced by constant scaling factors (v14 and el14 in equation (1)) as compared to the regular interactions. The reduction is connected with the em
pirical observation that if the regular values for the non-bonded interactions are used for the 1-4 pairs then the barriers for torsional transitions become too high compared to the experiment. The 1-4 list is therefore used to reduce the barrier heights f
or rotations around individual bonds below the experimental value. No precise value is required since the final adjustment will be made by the torsion energy. This is also why a uniform scaling for 1-4 interactions of different atoms can be employed. 
\par 
\pard \qj \tab The
 exclusion list is (of course) a guiding tool to compute the major list - the list of atom pairs for which the non-bonded interactions are calculated. The list is generated according to a cutoff criterion and in order to save time it is updated only after 
a few sequential energy evaluations. For example, after several steps of minimization or several steps of dynamics (typical values vary between 10 and 50 steps) an update is performed. It is done in a hierarchical way: first a list of monomers with their c
enter of mass closer than a cutoff distance (see below) is constructed. Based on the monomer list an atom list is calculated. The distances between atoms of monomers that are in the list are computed. Pairs with distances shorter than a cutoff (typical cut
off distances vary between 8-15\'81
) are further checked to verify that they are not in the 1-4 or the exclusion lists. The remaining pairs are stored in a similar way to the exclusion list: in two integer arrays. the first array pointx(i) points to the last 
neighbor of the i-th atom which is stored in the second integer array - listx(pointx(i)). Thus the neighbors of the i-th atoms are stored in listx(pointx(i-1)+1) to listx(pointx(i)). Double counting of pairs is of course avoided, and the list is organized 
such that the neighbors always have a higher index number. It is therefore not simple to extract all the neighbors of a particle from this list, since this requires a search for neighbors with a lower index.\par 
\pard \qj \tab  A significant reduction in computational effor
t is achieved using the cutoff distance. The basic idea is the following: It is assumed that beyond a certain distance the interaction can be neglected. The computational effort (CE) is proportional to the number of atom pairs whose interaction energies ne
ed to be calculated. Without cutoff the Computational Effort is CE=A N{\up6 2}
 where A is a proportionality constant that does not depend on N. For a given cutoff distance the number of neighbors to a particle is approximately constant and independent of the tota
l number of particles in the system. Thus, in the cutoff scheme the computational effort grows linearly with the number of particles. The disadvantages of using cutoff are significant when the system under investigation includes substantial (long range) el
ectrostatic interactions. This is a topic of an ongoing research. Obviously the ideal approach would be no cutoff at all. However, one must bear in mind that some empirical potentials (including OPLS - the potential that is used in MOIL) passed final tunin
g using cutoff. It is therefore difficult to separate the model potential from the cutoff system that was employed in developing it. \par 
\pard \qj \tab 
The above discussion of the nonbonded list is a correct but simplified description of what is going on in MOIL. In fact the "x" in pointx and listx is varied from 1 to 3 and we are using three lists instead of one to calculate the non-bonded interactions.
 We emphasize that since 60-90 percent of the computation time is spend in the non-bonded calculations, it is worth invest
ing the time to tune this part of the code in the spirit described below. There are two points that we should like to use to speed up the computation. The first is that not all particles are charged. It is obviously unnecessary to compute variables require
d for electrostatic calculations such as distances between particles and finally to multiply the result by zero - the particle charge. The second point is that the range of forces for electrostatic and van der Waals interactions are very different. The Len
nard Jones forces are of considerably shorter range compared to the electrostatic. (The van der Waals potential decreases like 1/R{\up6 6}
 while the electrostatic decreases like 1/R, where R is the distance between two particles). Therefore it makes sense to use two different cutoffs for the two energy terms. Typical values that we use for van der Waals cutoff is 7-8\'81
 and for electrostatic is 10-15\'81. The above two points could have been addressed by "if" statements during the energy calculations. Thus, "if" can be
 used to check if the particle is charged or if the distance is such that the van der Waals energy is negligible compared to the electrostatic. "if" statements, however, can be costly and may prevent vectorization. This is why we use three lists. The first
 list is for charged pairs (both particles) that are within the short (van der Waals) cutoff. For these pairs electrostatic interactions and van der Waals energies are calculated. The second list is for pairs within the van der Waals cutoff but for which a
t least one of the particles is not charged. Therefore there is no need to calculate electrostatic energies. The third list includes charged pairs with distances larger than the van der Waals but smaller than the electrostatic cutoff. Only electrostatic in
teractions are calculated for these pairs.\par 
\pard \qj \tab The size of the arrays of the non-bonded lists can be very large and in many cases they are a major factor in determining the program size and the memory requirements. Therefore some estimates for memory size are
 provided below. The lists of non-bonded neighbors for a small protein (around 1000 particles) using cutoffs of the order of 10\'81
 consist of approximately 300,000 neighbors. The identity of the neighbors is stored as integers and the list size is therefore:
 4 bytes times 300,000 neighbors = 1.2MB. Since (good) MD programs rarely exceed a few megabytes in memory size, this list can be, and in many cases is, size determining. It is possible to avoid the use of the list by examining all pairs of particles and t
esting them against the exclusion list, the 1-4 list and the cutoff. However, the additional tests and the fact that all pairs need to be examined will slow the program considerably. It is therefore recommended (unless you simply do not have the space to s
tore the nonbonded list) to use stored neighbors and to avoid as much as possible the above mentioned tests. \par 
\pard \qj \tab In some code people attempt to save space by using short integers for the non-bonded lists and mixing double precision and single precision numbe
rs. This is not recommended on modern architecture which in many cases is slower for short integers. Furthermore, mixing single precision and double precision numbers requires function calls (a major slow down) and may also lead to significant loss of accu
racy. \par 
\pard \qj \tab If memory is {\i not}
 an issue, it is possible to speed-up the computation further by pursuing some of the calculations on the level of list preparation before the energy is called. One should be quite careful however in deciding that "memory is not an 
issue", since cache memory is still small and loading from memory to the CPU can be costly too. The only real help we can provide at present is to describe general principles and recommend some trials. An example for use of more memory to reduce actual com
putation is the following: During the energy calculations we compute the variables A{\dn4 ij} (equation 1) by multiplying the square roots of A{\dn4 ii }and A{\dn4 jj }{\up6 7}
. For N particles the multiplication requires storage of order N (storing all the A{\dn4 ii} and A{\dn4 jj}). Storing the A{\dn4 ij}
 requires much more space - the array required is of the same length as that of the neighbor list. If space is available such that one can store two more arrays of A{\dn4 ij }and B{\dn4 ij}
, then the multiplication is required only when the non-bonded list is updated. Furthermore, these long arrays should lead to a better vectorization and cache use. This is since these vectors will have a direct reference in the neighbor loop and the use of
 pointers will be avoided.\par 
\pard \qj \tab The potential parameters are the major difference between the potential energies of different computer programs. The variations between most of the functional forms employed are rather minor. The parameters include the force constants k{
\dn4 b}, k{\f23\dn4 Q} and k{\dn4 imp}, the equilibrium positions for the bonds, angles and improper torsions: b{\dn4 o}, {\f23 Q}{\dn4 o} and {\f23 f}{\dn4 io}, the amplitude, the periodicity and the phase factors for the torsions - A{\dn4 i}, n{\dn4 i}
, and {\f23 d}{\dn4 i}. Other parameters are for the non-bonded interactions and include A{\dn4 ij}, B{\dn4 ij} for the van der Waals and q{\dn4 i}, q{\dn4 j} and {\f23 e} for the electrostatic 
interactions. v14 and el14 are the 1-4 scaling parameters for van der Waals and electrostatic interactions respectively. In MOIL these factors are 0.5 and 0.125 (adopted from OPLS{\up6 7}
). They are read however as external parameters and therefore can be easily modified.\par 
\pard \qj \tab We comment (as already mentioned above) that the van der Waals (or the Lennard Jones) parameters are stored as single particle properties. The A{\dn4 ij }and the B{\dn4 ij } are given in terms of the single particle properties - {\f23\fs28 
s}{\dn4 i} and {\f23\fs28 e}{\fs28\dn4 i}{\fs28  :\par 
}\pard \qj {\fs28 \par 
}\pard \qj A{\dn4 ij }= 2 [{\f23\fs28 s}{\dn4 i}{\fs18\up6 12}{\dn4  }{\f23\fs28 s}{\dn4 j}{\fs18\up6 12}{\dn4  }{\f23\fs28 e}{\dn4 i }{\f23\fs28 e}{\dn4 j}]{\fs18\up6 1/2  };   B{\dn4 ij }= 2 [{\f23\fs28 s}{\dn4 i}{\fs18\up6 6}{\dn4  }{\f23\fs28 s}{\dn4 
j}{\fs18\up6 6}{\dn4  }{\f23\fs28 e}{\dn4 i }{\f23\fs28 e}{\dn4 j}]{\fs18\up6 1/2    }(3)\par 
\pard \qj {\fs18\up6 \par 
}\pard \qj where {\f23\fs28 s}{\dn4 i/j }is the van der Waals radius for the i/j-th particles and similarly {\f23\fs28 e}{\dn4 i/j} is the well depth. Another common and potentially confusing way of defining {\f23\fs28 s}
 is as half of the van der Waals radius. In MOIL we use equation (3).\par 
\pard \qj \tab All of the parameters used in MOIL were adopted from other published force fields (covalent structure from AMBER{\up6 2}, improper torsions from CHARMM{\up6 3} and non-bonded interactions from OPLS{\up6 7}
). The potential energy parameters are stored in an external data-base (external to the {\fs26 FORTRAN}
 code). The data-base is written in a format close to free, which makes it relatively easy to modify or to add to its content. They are kept in a "property file" and are used only when a new molecule is created. Since this file is quite large we did not in
clude it in the present manuscript. However it is also available via anonymous ftp (see Appendix 1). Similarly to the energy function the properties are organized in a hierarchical structure. There are prop
erties of individual particles, of pairs of particles, and of three and four particles. We prefer to use the name "particle" rather than an atom since some of the "particles" that are commonly used are groups of atoms. For example, CH{\dn4 n}
 is a single particle in the model potential that we use. Each particle has a mass, a van der Waals radius, a van der Waals energy and a charge. More single particle parameters (such as polarizability) can be added (in principle). At present MOIL includes 
only the properties mentioned above. Assuming transferability of parameters, approximately one hundred different types of particles are required to describe a protein. For example, all the C{\f23\dn4 a}'s{\dn4  }that {\b are not}
 at the beginning of the chain (N terminal) or at the end of the chain (C terminal) have the same "single particle" properties.\par 
\pard \qj \tab Properties of the covalent structure are listed in the property file afterwards. Thus, a list of all possible bonds, angles, torsions and improper torsions for different types of particles is provid
ed. The equilibrium positions and other parameters that are required for energy calculations are listed.\par 
\pard \qj \tab 
The information provided in the property file is in principle complete. It is possible to construct now any molecule of interest provided that the chemical connectivity of the molecule is defined by the user and all types of needed parameters appear in th
e property file. At present the MOIL data files support proteins but not DNA or other macromolecules. Generating large molecular systems based on infor
mation for one and up to four particles is inconvenient. Therefore an intermediate data structure that includes larger structural segments is employed. In MOIL it is called the monomer file. Similarly to the property file, the monomer file is too large to 
be included in this manuscript, however, it is available via an anonymous ftp - {\b Appendix 1}
. We make use of the repeating chemical units which are very common in biology, e.g., the amino acids for proteins. The particles in each monomer are assigned a uniqu
e name within that monomer and an identification for the look-up table of "single particle properties" - i.e. the property file. The new information in the monomer file is the covalent structure (listing all bonds) of the particles in a monomer. This saves
 the user a considerable amount of work in defining the connectivity of "standard" pieces. However since the chemical connectivity is usually known, the monomers file is aimed at convenience, not necessity. The database further lists links necessary for po
lymerization, i.e., "virtual" particles in the PREVious or NEXT monomer that will be bonded to the present monomer. The "virtual" particles become real once the previous and next monomers are specified by the users.\par 
\pard \qj \tab With the help of the monomer file, it is possible to generate the energy function for all macromolecules for which the monomers are defined. This is done by simply listing the sequence of the monomers. The program {\b con}
 (the first program in the MOIL package) reads the file of properties, the file of monomers and the file of "polymerization" (the list of monomers that together consist the molecule of interest). For each monomer in the monomer files {\b con}
 identifies "virtual particles" - particles that exist in the NEXT or PREVious monomers and are used for the polymerization process. Based on the "virtual particles" and the polymerization list {\b con}
 builds the specific molecule of interest. It is worth emphasizing that the short polymerization file is the {\i only} information that differs from molecule to 
molecule, since the data on properties and monomers is a general purpose fixed database that was already constructed. Of course, if you have an unusual molecule with monomers that do not appear in the database, adding these monomers to the property and mon
omer files will be required.  {\b con} further builds the rest of the covalent data that was not defined in the monomer file: (angles, torsions, improper torsions, exclusion and 1-4 lists). This is done automatically, based on the bond structure, in a {
\b maximal way}. For example, the CH{\dn4 3}-CH{\dn4 3  }
molecule has nine torsions of the type H-C-C-H. This is important to remember since care must be used when transferring data between different force fields. For example, some programs use only a single torsion along the C-C bond. The barrier for a single t
orsion may differ by a factor of nine if a wrong interpretation is given to the data. The output of the con program is a file that defines the energy function for the whole molecule and in MOIL is called the connectivity file.
 Once it is generated for a given molecule, other more interesting applications can be pursued that use this file. Examples will include energy calculations, energy minimization or dynamics. Each one of the above is a separate program. The programs interac
t with each other via the common connectivity file and via coordinate files describing the position of the particles in space. From the programmer's point of view the different programs share a large number of common tools, such as interpretation of the co
mmand line, routines to read coordinates, connectivity files, and a large set of common databases and FORTRAN common blocks. Furthermore, the same routines for the calculations of the energies and the forces are used by all programs attempting to calculate
 dynamic or thermodynamic properties.\par 
\pard \qj \tab Splitting of the applications between many separate programs has the advantage that the executing files are more compact. They require relatively small space and memory and are likely to be more efficient. The disadva
ntage compared to "one program does it all", is that the bookkeeping is more difficult. From the user's point of view, keeping track of many programs can be confusing and time consuming. This problem can be solved however on the shell level. Since we are f
amiliar with the multiple programs, we did not bother to "shell" the different execution files of MOIL. The fact that the executing files remain lean makes it possible to address larger problems than can be investigated using "the one program does it all" 
approach. The assumption made in MOIL is that of an "educated user", that should be able to find his/her way through the different modules. MOIL is not designed (or aimed) to replace a commercial, mouse driven code.\par 
\pard \qj \tab 
In the next section we discuss the basic steps that are required to pursue a molecular dynamics simulation using MOIL. The starting point will be the availability of a pdb file. More advanced topics (LES and reaction paths) will be described in the follow
 up section.\par 
\pard \qj \par 
\par 
\par 
\pard {\b EXAMPLE: MOLECULAR DYNAMICS SIMULATION USING MOIL\par 
\par 
}\pard \qj \tab 
In this section we describe the procedure of getting 100 psec simulation time for the protein myoglobin with a partial solvation shell. This section is rather technical and does not contain new scientific results. However it may be useful as a reference p
oint, since essentially all molecular dynamics code must pass through the same steps. It can be useful to understand what is going on in some "black box" procedures even if the procedure is hidden by a nice user interface of a co
mmercial program. This is especially important if problems are encountered.\par 
\pard \qj \tab 
In MOIL the first step is ALWAYS the generation of the connectivity file. Generating the connectivity file for myoglobin requires six separate files. The monomer file and the property file are already available as a part of the MOIL database. The rest (fo
ur files) must be provided by the user. This may seem a lot but as will be demonstrated below these files are quite short, and only the counting to four may pose a problem. One of 
the files is for polymerization, and a second file provides directions to add bonds. An example for a bond addition that is going beyond what is already available in the monomer file is the bond between the histidine and the heme iron. Another example is o
f a disulfide bridge. Further modifications to the covalent structure when necessary are listed in the third file. These changes are required only for "non-standard" groups when the automatic procedure for generating angles and torsions is "overdoing it". 
Finally an input directing the program {\b con }to the location of the above mentioned files is requested. In many cases the file to add bonds or the file to edit the covalent structure are not required. Here is the {\b con}
 input with directions to the location of other files:\par 
\pard \qj \par 
~ a comment line starts with a "~"\par 
~ below is the address of the pre-prepared monomer file\par 
~ "file" states that this is an i/o line, "mono" is the file\par 
~ type, "name" says it all, "read" - the file is open for read \par 
~ only. Other options to open a file (in addition to "read")\par 
~ is "writ" in which the file must not exist, and "wovr"\par 
~ in which an existing file is written over.\par 
\pard file mono name=(ALL.MONO) read\par 
~\par 
~ ALL.PROP is the name of the pre-prepared property file\par 
file prop name=(ALL.PROP) read\par 
~\par 
~ polymerization file is called below\par 
file poly name=(poly.mb) read\par 
~\par 
~ add bond between iron and histidine\par 
file ubon name=(addb.mb) read\par 
~\par 
~ remove unnecessary angles in the heme planes that\par 
~ were generated by the automated procedure\par 
file uedi name=(edit.mb) read\par 
~\par 
~ output (the connectivity file) will be written on wcon.mb\par 
file wcon name=(wcon.mb) wovr\par 
~\par 
~ execute\par 
\pard \qj action\par 
\par 
\par 
The polymerization file is listed below:\par 
\par 
\pard MOLC=(MYOG) #mon=238\par 
NTER MET VAL LEU SER GLU GLY GLU TRP GLN LEU VAL LEU HIS VAL\par 
TRP ALA LYS VAL GLU ALA ASP VAL ALA GLY HIS GLY GLN\par 
ASP ILE LEU ILE ARG LEU PHE LYS SER HIS PRO GLU THR\par 
LEU GLU LYS PHE ASP ARG PHE LYS HIS LEU LYS THR GLU\par 
ALA GLU MET LYS ALA SER GLU ASP LEU LYS LYS HIS GLY\par 
VAL THR VAL LEU THR ALA LEU GLY ALA ILE LEU LYS LYS\par 
LYS GLY HIS HIS GLU ALA GLU LEU LYS PRO LEU ALA GLN\par 
SER HIS ALA THR LYS HIS LYS ILE PRO ILE LYS TYR LEU\par 
GLU PHE ILE SER GLU ALA ILE ILE HIS VAL LEU HIS SER\par 
ARG HIS PRO GLY ASN PHE GLY ALA ASP ALA GLN GLY ALA\par 
MET ASN LYS ALA LEU GLU LEU PHE ARG LYS ASP ILE ALA\par 
ALA LYS TYR LYS GLU LEU GLY TYR GLN GLY CTRG\par 
HEM1\par 
CO\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3\par 
TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3 TIP3         \par 
*EOD\par 
\pard \qj \par 
\pard \qj 
This file is almost self explanatory. It includes the sequence of the protein myoglobin, with the terminators of the peptide chain (non-pdb) - NTER and CTRG - which are defined as monomers in MOIL. Also provided are a monomer HEME (HEM1), a ligand (CO) and
 a small droplet of water (a few water mo
lecules - TIP3). This file (in contrast to the property and the monomer files) must be constructed by the user for the molecule of interest. More details on the construction of this file can be found in the standard documentation of the program and we do n
ot provide them here. After reading the polymerization file {\b con}
 creates the list of all the particles in the molecular system, lists of all the properties of the single particles and all the bonds that are referred to in the "standard" data base. After this is done {\b con }
looks for additional bonds that cannot be included in the general polymerization procedure. An example which is relevant to the previously presented sequence of myoglobin is of bond formation between the proximal histidine and the heme iron. The input for 
the last is found in the addb.mb file and is\par 
\pard \qj \par 
\pard bond chem HIS 95 NE2 HEM1 157 FE\par 
\pard \qj *EOD\par 
\par 
\pard \qj It directs the program to add a bond between the N{\f23\dn4 e}{\dn4 2} atom of the residue His 95 (the proximal histidine) and the iron at the center of the heme group. Note t
hat the addition of an N terminal residue in MOIL changes the reference number of the proximal histidine from the common 94 in mutant myoglobin to the above 95. This should complete the list of all the bonds in the molecule and the {\b con}
 program proceeds to compute the list of all angles, torsions, improper torsions as well as the exclusion and the 1-4 lists. As discussed above the angles and the torsions are generated in a maximal way. This is working as expected in most of the cases, an
d specifically for all amino acids but not in all other cases. A well known counter example is the case of the heme in which the pyrrole nitrogens and the central iron form some desired and some undesired angles. NA-Fe-NB is a 90{\up6 o}
 angle that is kept, however NA-Fe-NC is 180{\up6 o} angle that should be removed from the list. It is necessary at this point to tell {\b con} that some internal degrees of freedom require elimination which is done in the file edit.mb :\par 
\pard \qj \par 
\pard remo angl chem HEM1 157 NA HEM1 157 FE HEM1 157 NC\par 
remo angl chem HEM1 157 NB HEM1 157 FE HEM1 157 ND\par 
\pard \qj *EOD\par 
\par 
\pard \qj  It should be emphasized that such operations are quite rare. If "special" groups (like the heme) are not present "editing" is not required. More common however is the action of bond addition due to the presence of sulfur bridges. \par 
\pard \qj \tab At this point {\b con }is all done. All information that required for energy calculations was produced and it is written to the file wcon.mb as the final act of the above program.\par 
\pard \qj \tab Essentially all of the MOIL programs depend on the existence of the con
nectivity file (which we named here as wcon.mb). The next step in our attempt to run molecular dynamics of myoglobin is to produce a coordinate file readable to the dynamics program from the pdb (protein data bank) coordinates. We are using the format of t
he CHARMM coordinate files. This is done by the program {\b puth}. {\b puth }
also places at covalently reasonable positions all the missing hydrogens. In protein crystallography the resolution is insufficient to place hydrogens with certainty. Nevertheless, the pola
r hydrogens are needed in the model potential that we use. Given the pdb coordinates and the connectivity file, {\b puth }
provides an output coordinate file that is acceptable for all other MOIL modules. Some work is required in preparing the pdb file for a successful manipulation by {\b puth}
 which is described in the documentation. As one example we note that multiple positions of side chains (that are sometimes found in experiments and therefore also in pdb files) must be removed before {\b puth} can be used. A "typical"{\b  puth}
 input is given below\par 
\pard \qj \par 
~ Here come the necessary connectivity file\par 
\pard file conn name=(wcon.ery) read\par 
~ This is a pdb file to be read by the program\par 
file rcrd name=(erythro.co) read ctyp=(pdb)\par 
~ This is the new coordinate file - CHARMm style\par 
~ that is used everywhere in MOIL\par 
file wcrd name=(ery.CRD) wovr ctyp=(CHARM)\par 
\pard \qj action\par 
\par 
\pard \qj The hydrogens are placed by {\b puth} using covalent geometry only. If more than one possibility exists a random choice is made. It is therefore possible that the position of the hydrogen
s are not optimal for van der Waals and electrostatic interactions. Short minimization after {\b puth }is highly recommended. There are two minimizers in MOIL: one is based on conjugate gradient algorithm ({\b mini_pwl}
) and the second on a truncated Newton-Raphson procedure ({\b mini_tn}). It is surprising but nevertheless true that in all our studies the conjugate gradient was doing better that the truncated Newton-Raphson algorithm. Below is a sample input\par 
\pard \qj \par 
\pard file rcon name=(wcon.mb10co)  read\par 
file rcrd name=(MB10CO.CRD)  read\par 
file wcrd name=(MB10CO.MIN.CRD)  wovr\par 
file wmin name=(mb10co.min)     wovr\par 
epsi=1.  cdie v14f=8. e14f=2.\par 
tolf=0.01  mistep=50    list=10 rvmx=7. relx=10.\par 
\pard \qj action\par 
\par 
\pard \qj The organization of the minimization input file is similar to the corresponding input file for {\b puth}
 with a few additional instructions. First, the (already existing) connectivity and coordinate files are identified to the program, the output files include the coordinates of the final minimized structure and the minimization history (wmin). I
n the next two lines some parameters for the energy and the minimization algorithm are defined: {\b epsi} is the dielectric constant, {\b cdie} means that constant dielectric will be used, {\b v14f} and {\b e14f}
 are 1-4 scaling parameters (as suggested for the OPLS force field). {\b tolf }is the maximum force gradient that is acceptable as a converged solution and {\b mistep}
 is the maximum number of minimization step that is allowed in the present calculations. {\b list }is the number of minimization steps between updates (recalculation) of the list of interacting particles according to the cutoff criterion. {\b rvmx}/{\b 
relx}
 are cutoff distances for the van der Waals and electrostatic interactions respectively. We comment that the input files of all MOIL modules are similar and consist first of a list of relevant files followed by assignment of relevant parameters. None of th
e input files require FORTRAN formatting and this includes also the property and monomer files. The line interpreter in MOIL picks only the first four characters of a given name, for example {\b mistep }is interpreted internally as {\b mist}
, the additional characters are ignored. The order of the assigned parameters is not important and an instruction line can be extended to the next line by adding a dash ("-") at the end of the line. The interpreter breaks the line to individual expressions
 using the spaces in the line. The use of alternatives (like tabs) is not supported. After the "line break" is completed the interpreter "echoes" the line to the standard output. Thus the appearance of
 the line in the standard output does not imply that all the parameters were correctly read by the program. The expressions are searched later to find parameters that the program needs. \par 
\pard \qj \tab Once the minimized structure is obtained (MB10CO.MIN.CRD) we can continue with the dynamics. This is the last example closing this "technical" session.\par 
\pard \qj \par 
\pard file conn name=(wcon.mb10co) read\par 
file rcrd name=(MB10CO.MIN.CRD) read\par 
file wcrd name=(MB10CO.DCD) bina  wovr\par 
file wvel name=(MB10CO.VCD) bina  wovr\par 
\pard #ste=50000  step=0.002 info=100 #equ=10000  #crd=10 #vel=10 #lis=10 #scl=10  #tem=1  rand=-3451187 tmpi=10 tmpf=300\par 
\pard rmax=10. epsi=1.  cdie v14f=8.  e14f=2. shkb\par 
\pard \qj action\par 
\par 
\pard \qj 
Similarly to the minimization we provide the program with a connectivity file (first line), a starting coordinate file (second line) and we assign output coordinates and velocities files which are written in a compressed form (wcrd and wvel). History of th
e dynamics (i.e. regular reports along the trajectory such as average temperature, energies etc.) are written on the standard output. After the list of files some additional instructions are provided. {\b #ste}
 is the number of steps to be calculated.  The {\b step} size is 0.002 psec in our example, {\b info}rmation on the trajectory is written on the standard output each 100 steps of dynamics. 10,000 equilibration steps ({\b #equ}
) are calculated and coordinate sets ({\b #crd}) are written to the dynamics coordinate file each 10 steps. The non-bonded lists ({\b #lis}) are re-calculated each ten steps and the velocities are also scaled each t
en steps to preserve the kinetic temperature. There is only one temperature in the example that we considered ({\b #tem}
). Multiple temperatures in the system can be useful in a number of cases. For example, consider a conformational transition of a solvated peptide. In order to explore alternative configurations of the peptide in shorter periods it is possible to simulate 
the peptide at higher temperature than the solvent. {\b rand} provides a seed for a random number generator that samples the velocities from a Boltzmann distribution at the beginning of the trajectory. {\b tmpi }and {\b tmpf}
 are the initial and final temperatures in the equilibration process, {\b rmax} is a cutoff distance which sets both {\b rvmx }and {\b relx} equal to the same number and it is used for compatibility reasons. {\b epsi},{\b  cdie}, {\b  e14f},{\b  }and {\b 
v14f} were described in the minimization step. {\b shkb }directs the program to fix the length of all the bonds in the system using the velocity SHAKE algorithm (RATTLE{\up6 10}). More parameters and options can be found in the documentatio
n of the program.\par 
\pard \qj \tab 
The protocol required for producing dynamics from protein data bank files is therefore quite long. However it is rather straightforward and the input for the different programs is quite short. Essentially the same steps are followed by all molecular dynam
ics programs. As a summary we list again the programs one needs to run and the main result of each of the runs:\par 
\pard \qj \par 
1. {\b con} - generating the connectivity file\par 
\pard \qj 2. {\b puth } - generating from the protein data bank coordinates another coordinate file that includes hydrogens.\par 
3. {\b mini_pwl }- minimized the output of {\b puth }to obtain energetically more reasonable starting point\par 
4. {\b dyna }- run a molecular dynamics trajectory. Main result: coordinates as a function of time.\par 
\pard \par 
\par 
\par 
\pard {\b SPECIAL TOPICS: THE LOCALLY ENHANCED SAMPLING (LES) METHOD AND THE SELF PENALTY WALK (SPW) ALGORITHM}\par 
\pard \qj \par 
\pard \qj \tab In this section we describe two relatively new computational methods that are unique to MOIL. To the best of our knowledge, these computational methods, while published and established w
ere not implemented yet in other generally accessible codes.\par 
\pard \qj \tab 
The first of the techniques is called LES (locally enhanced sampling). LES is a mean field approach that we introduced to the simulations of proteins. The words "mean field" can be misleading, since some of our applications of LES are exact, i.e., free en
ergy calculations{\up6 16} and structure determination by energy optimization{\up6 6}. Applications of LES to time dependent problems are approximate. Nevertheless, LES made it possible for the first time to i
nvestigate ligand diffusion through a flexible protein matrix in a statistically meaningful way{\up6 12,14,15}. The application of LES in conjunction with experiment identified transient binding sites in the myoglobin matrix{\up6 15}
. Recently, we significantly improved the accuracy of the application of LES to dynamics by use of a binary collision correction{\up6 11}. Detailed derivation of the LES protocol was given in the past{\up6 6,11,12}
, and here we describe the main results and how LES is translated to the computer code. In LE
S we selectively enhance the sampling of a part of the system that is of prime interest. As an example we consider the problem of side chain modeling in which LES is employed to enhance the sampling of the side chain configurations. Modeling of side chains
 is useful in refinement of low resolution X-ray and NMR structures in which only the protein backbone can be traced. Positioning of side chains is also important in homology modeling in which the backbone structure of the modeled protein is adopted from a
 known protein structure with a similar sequence. One way of determining the side chain structure is via energy minimization (energy like the one defined in MOIL). Direct energy minimization using {\b mini_pwl} is unlikely to work, since {\b mini_pwl}
 finds a local energy minimum that is the closest to the initial guess. More global optimization that searches for minima beyond the local neighborhood is therefore required. Simulated annealing is one such approach{\up6 21}
. In simulated annealing we heat the system and slowly
 cool it down. The kinetic energy makes it possible to overcome barriers that separate minima. If the cooling is done infinitely slow we are guaranteed to find the global energy minimum (the lowest energy minimum of all available minima). Practical (finite
) cooling can give a minimum which is not the lowest. Nevertheless typical results are considerably better than direct minimization. The LES protocol is an addition to the simulated annealing that dynamically modifies the target function - the energy - tha
t we optimize. Pictorially what is done is the following: for each of the C{\f23\dn4 a}
 of the different residues we attached multiple copies of the same side chain. The multiple copies of the side chain are set in a special way. They do not see the other copies of the same residue and they interact with the protein backbone and the rest of 
the copies in an average way. We now anneal the new system. The basic idea in LES is to average the potential over a distribution that is determined from a mean field condition{\up6 6,11}
. The distribution is given by a Hartree product of the part that we do not want to enhance and the part that we do. Here, the side chains are the part that we enhance. The averages over the Hartree products considerably reduce the barrier heights and ther
efore make the optimization easier. Of course, one should be careful not to make the potential too flat by using distribution functions that are too broad. Without barriers there is no more than one minimum which is not necessarily the correct one. Neverth
eless, the requirement that the potential will not be over-cooked does not mean that we need to "eat" it raw! The shape of the distribution used in the averages is determined in LES by the temperature of the system and by the equations of motion. More form
ally, let {\b R}{\dn4 r }be the coordinate vector of the "rest of the system" and {\b r}{\dn4 i} be the coordinate vector of the i-th side chain that its position we should like to optimize. Then the probability density of coordinates - {\f23 r} {\f3 -}
 is approximated by {\f23 r} = {\f23 r}{\dn4 r}({\b R}{\dn4 r}) {\f23 P} {\f23 r}{\dn4 i}({\b r}{\dn4 i}) where {\f23 r}{\dn4 r} and {\f23 r}{\dn4 i}'s are given by:\par 
\pard \qj {\f3 \par 
}\pard \qj {\f23 r}{\f3\dn4 r }{\f3 = }{\f23 d}{\f3 (}{\b\f3 R}{\f3\dn4 r}{\f3  - }{\b\f3 R}{\f3\up6 o}{\f3 (t))\tab \tab }{\f23 r}{\f3\dn4 i}{\f3  = }{\f23 S w}{\f3\dn4 k}{\f23  d}{\f3\dn4 k}{\f3 (}{\b\f3 r}{\f3\dn4 i}{\f3  - }{\b\f3 r}{\f3\up6 o}{\f3 
(t))\par 
}\pard \qj {\f3 \par 
\par 
}\pard \qj The vectors {\b R}{\up6 o}(t) and {\b r}{\up6 o}(t) contain time dependent parameters for which we solve equations of motion:{\b\f3  \par 
}\pard \qj {\b\f3 \par 
}\pard \qr {\b\f3 M}{\f3 d}{\f3\up6 2}{\b\f3 R}{\f3\up6 o}{\f3 (t)/dt}{\f3\up6 2}{\f3  = - }{\f23 S w}{\f3\dn4 i}{\f23 \emdash }{\b\f3\dn4 R}{\f3 U(}{\b\f3 R}{\f3 ,}{\b\f3 r}{\f3\dn4 i}{\f3 )                        (4)}{\f3\dn4 \par 
}\pard \qc {\f23 w}{\f3\dn4 i}{\b\f3 m}{\f3\dn4 i}{\b\f3  }{\f3 d}{\f3\up6 2}{\b\f3 r}{\f3\dn4 i}{\f3 (t)/dt}{\f3\up6 2}{\f3  = - }{\f23 w}{\f3\dn4 i}{\f23 \emdash }{\b\f3\dn4 r}{\f3\dn4 i}{\f3 U(}{\b\f3 R}{\f3 ,}{\b\f3 r}{\f3\dn4 i}{\f3 ) \par 
}\pard \qj {\f23 \par 
}\pard \qj where{\f3  }{\b\f3 M }and{\f3  }{\b\f3 m}{\f3\dn4 i }are the mass matrices and {\b\f23 w}{\f3  }
is a diagonal weight matrix for the multiple copies. If all the copies of the different side chains have the same weight, as is usually the case, the diagonal elements of {\b\f23 w} are equal to 1/N{\dn4 i} where N{\dn4 i}
 is the number of copies of the i-th side chain. Note that the forces (i.e.{\f3  -}{\f23 \emdash }{\b\f3\dn4 R}{\f3 U(}{\b\f3 R}{\f3 ,}{\b\f3 r}{\f3\dn4 i}{\f3 ) ; -}{\f23 \emdash }{\b\f3\dn4 r}{\f3\dn4 i}{\f3 U(}{\b\f3 R}{\f3 ,}{\b\f3 r}{\f3\dn4 i}{\f3 )
}
 ) are averaged over the current distribution that is determined in a self-consistent way from the equations of motion. This average reduces barrier heights and increases the relative energies of the minima. We refer the reader to an earlier publication fo
r a proof{\up6 6}. The problem of annealing the density is reduced to solving ordinary trajectories (initial value differential equations) for a different system that include multiple copies of side chains.\par 
\pard \qj \par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\par 
\pard \qj {\b\fs20 Fig.2 Snap shots in time for the distribution of the phenylalanine 43 side chain during the annealing of 45 side chains in myoglobin}{\b\fs20\up6 23}{\b\fs20 
. (a) High temperature structure, (b) Room temperature structure. Note that in the lower temperature structure the spatial distribution of the copies of the side chain is much narrower, as it should.\par 
}\pard \qj {\b\fs20 \par 
}\pard \qj  The results of an annealing using the LES protocol are exact. We proved{\up6 6} that the global energy minimum of t
he original and the mean field systems is the same and therefore the annealing of the LES energy with lower barriers is clearly advantageous to the regular annealing. We obtain the correct answer with better and easier exploration of alternative conformati
ons. More details on the methodology and other applications of LES can be found elsewhere{\up6 6,11,12,14,15}. Here we should like to discuss several programming issues.\par 
\pard \qj \tab MOIL was built with LES in mind, therefore defining the mean field system is done in a relatively easy (?!) way. All that the user needs to do is to {\b mult}
iply the desired monomers during the creation of the connectivity file and to provide multiple coordinates for the same monomers. The only new feature can be found in the poly file:\par 
\pard \qj \par 
\pard ~\par 
MOLC=(PEP3) #mon=8\par 
NTER GLY mult ALA 4 PHE CTER\par 
\pard \qj *EOD{\f3  \par 
}\pard \qj {\f3 \par 
}\pard \qj\tx6120 We define a molecule that in "real life" should have five monomers NTER-GLY-ALA-PHE-CTER (including the terminus residues NTER and CTER). We request the multiplication of the alanine residue by a factor of 4 (mu
lt ALA 4). The four copies of the alanine residue do not see each other and they are all connected to the GLY on the left and to the PHE on the right. The interaction energies with the copies are all normalized by a factor of 4. The program considers the s
ystem to have 8 monomers with four of them having the special properties described below. The special properties - the normalization of the forces, the normalization of the masses (multiplying the mass by the weight {\f23 w}
) and the exclusion of the interactions
 between copies of the same residues - are all taken care of during the creation of the connectivity file. LES particles are treated in an identical way to real particles, which makes programming and management of the code simple and straightforward. One c
an run or program dynamics applications, energy minimization and watch "movies" of multiple side chain copies (figure 2). For all practical purposes one can forget that these are not real particles but rather parameters describing the evolution of approxim
ate probability density. At present LES is regularly used for searching plausible diffusion pathways{\up6 11,12,14,15}, for optimization{\up6 6} and for free energy calculations{\up6 16}
. This demonstrates the versatility and the usefulness of the new protocol for very different computational problems. \par 
\pard \qj\tx6120     There is one disadvantage from the user point of view to the above set-up of monomer multiplication. It is not possible to multiply as a single segment several residues, unless the user will define them as a single residue (
a process which can be time consuming). A facility that enables the multiplication of any group of atoms will be available in future releases of MOIL{\up6 13}
. There is another subtle computational point that is worth discussing and is related to the calculations of the non-bonded energy in LES. The non-bonded parameters are kept in the connectivity file as single particle parameters. For each LES particle we "
renormalize" the interactions by dividing the parameter{\dn4  }by the number of copies (e.g. A{\dn4 ii }B{\dn4 ii }, q{\dn4 i }are translated to A{\dn4 ii}/N B{\dn4 ii}/N and q{\dn4 i}
/N). This gives the correct result for an interaction of any LES particle with any real particle, or between any LES particle of a multiplied monomer with multiplied monomers of other residues. However, it does not give the correct answer for interactions 
within a LES monomer. For example, if we multiply a lysine, the "tail" of the side chain will interact incorrectly (using the "renormalized" parameters) with the backbone of the same residue. According to the protocol described above they are divided by N
{\up6 2}
 (one N for each particle) rather than by N as required by formula (4). In the energy code we test for such pairs and we multiply them by N for re-renormalization. Since the number of such pairs is usually small the computational cost of the re-renormaliza
tion is not large. Nevertheless, this is not an optimal solution. A more efficient approach will be to assign a special list for these interactions and to avoid the check mentioned above. The new list is likely to appear in future releases of MOIL.\par 
\pard \qj \tab Another unique piece of code in MOIL is {\b chmin}. It is a program to calculate reaction coordinates in large molecular systems. In contrast to LES, {\b chmin}
 is a separate module that uses ordinary connectivity files. It is based on the SPW (Self Penalty Walk) algorithm that we developed{\up6 5,22}
. In the SPW approach the reaction coordinate is approximated by a discrete set of intermediates that interpolate between the known reactants and the known products. The intermediates resemble a "polymer
" in which each of the monomers is a complete copy of the whole molecule(!). The "polymer" is hooked up at its two end points. One end is connected to the reactants and one end to the products. A minimum energy configuration for the "polymer" is a minimum 
energy path. Therefore, the basic calculation in the SPW approach is an energy minimization for the "polymer". Yet another view of the reaction path is of a walk between the reactant and the product, a walk in which there is a penalty for the polymer on cr
ossing itself. This is why the algorithm is called SPW - Self Penalty Walk. The SPW can easily handle more than 100 particles and is capable of calculating continuous and well defined low energy paths even if a reasonable adiabatic coordinate is not availa
ble. This makes it quite unique compare to alternative approaches such as the reference coordinate approach{\up6 17} or mode following{\up6 18}
 adopted from codes of small molecules. Its advantage compared to algorithms used for small molecules is that it does not requi
re the second derivative matrix. This makes it possible to calculate reaction coordinates in large molecular systems in a stable and reasonably economic way. Second derivative matrices are difficult to manipulate in large molecules for two reasons. First, 
the memory requirement is growing like N{\up6 2 }
where N is the number of particles. Second, the manipulation of these matrices and the extraction of relevant modes becomes exceedingly difficult when the density of the modes is high, as is the case for proteins. The polymer in {\b chmin}
 is larger than the original system. However it is optimized using first derivatives only. This makes the space requirement relatively small compared to mode following (see however the discussion below). The minimization of the energy of the polymer is als
o more stable than searches for "saddle points" pursued by the mode following techniques.\par 
\pard \qj \tab  Some computational limitations are discussed next: An initial guess for the complete path may be provided by the user. If a reasonable guess is 
available this option is highly recommended. Alternatively the program starts from a linear interpolation between the reactant and the product. The interpolation is done in Cartesian space. This is a straightforward algorithm that always produces something
. However, the linear interpolation usually gives paths with high energy barriers and therefore requires considerable computer (and human) time for refinement. Because the code works simultaneously on all the path structures, memory is an issue. Consider f
or example a system with 5,000 particles. Each of the particles has three degrees of freedom corresponding to motion along X,Y or Z axes. To specify the positions of the particles in the system 15,000 numbers (double precision) or 120Kbytes are needed. Thi
s is not too bad. However, if a path that consists of 100 intermediate structures is considered, the size increases to 12MB which is a significantly larger number and on workstations may create considerable difficulties. We therefore usually work with a sm
aller number of grid points and if there are additional intermediate minima between the reactants and the products we connect them in independent runs{\up6 5}
. Another memory "issue" is the non-bonded lists. For 5,000 particles we anticipate lists of total size of 1MB per structure. Generating individual lists for every intermediate requires 100MB of memory. Even if possible on some workstations, significant de
gradation in performance due to swapping and cache misses are expected. In MOIL we made the approximation
 of using one set of lists for the whole set of structures. The lists are generated according to the intermediate that has the middle index. For example, examining a path with 101 structures, structure 50 is used to update the lists. A single set of lists 
may create difficulties especially if the difference between the two end structures is large and the cutoff distance used is small. A possible solution to the problem is to use a larger cutoff distance. Another solution is to change the code to include mor
e than one set of lists (for example, one set for the reactant half of the path and one set for the half of the path of the product). Yet another solution is to generate arrays of differences between the lists, in addition to one common set of lists. The d
ifference arrays are expected to be much smaller than individual lists. No..., programming the last modification to the code is not high on our agenda but the talented reader is encouraged to do so.\par 
\pard \qj \tab For the record we provide a typical input to the {\b chmin} program which is similar in spirit to what we saw previously\par 
\pard \par 
file conn name=(VAL.WCON) read\par 
file rcrd unit=5 read\par 
file wcrd name=(VALMIN.PTH) bina wovr\par 
#ste=10000 #pri=10 #wcr=1000 list=10000 repl=1000.\par 
lmbd=2. gama=20. grid=5\par 
rmax=9999. epsi=1. cdie v14f=8. el14=2. cini\par 
action\par 
file name=(MOIL1.CRD) read\par 
\pard \qj file name=(MOIL60.CRD) read \par 
\par 
\pard \qj The first line direct the {\b chmin} program to the location of the connectivity file. In the second line the program is informed that the reactants and the products files will be i
dentified after the "action". This is what the unit=5 (standard input in FORTRAN) means. MOIL1.CRD includes the coordinates of the reactants and MOIL60.CRD the coordinates of the products. Similarly to minimization and dynamics {\b #ste}
 is the number of optimization steps for the whole reaction path. {\b repl lmbd }and {\b gama }are energy parameters for intermonomer interaction in the SPW "polymer", {\b grid} is the number of path structures and {\b cini}
 directs the program to use a linear interpolation between the two structures. {\b chmin}
 overlaps first the reactant and the product structures such that the rms will be a minimum and then calculates Cartesian averages between the structures. The path coordinates are written on VALMIN.PTH, optimization history is written to the standard outpu
t.\par 
\pard \qj \tab  In our experience we found that the {\b chmin}
 code is very robust, this is in molecular systems that vary from several tens of particles to thousands of particles. We did not find a molecular system on which our code was not able to find a reasona
ble solution. Further calculations of free energy path along the discrete set of points provided by {\b chmin }is also possible by (for example){\b  umbr }or {\b freee}. {\b umbr} pursues umbrella sampling calculation along the reaction coordinate{\up6 
19 }and {\b freee }- free energy perturbation{\up6 20}. This completes the required code for calculations of rates for chemical reactions and conformational transitions, within the limits of the transition state theory. \par 
\pard \qj \tab This also concludes the present paragraph describing some advanced features of MOIL.\par 
\pard \par 
\par 
\pard {\f3 \par 
}{\b\f3 FINAL REMARKS\par 
}\pard \qj {\f3 \par 
}\pard \qj\tx2160\tx2520\tx2700 
      In this chapter we discussed the general structure of a molecular dynamics program called MOIL. MOIL is designed as a flexible simulation package for users and programmers. For example, using a database of amino acids that is provided with MOIL it is
 possible to investigate computationally any protein. Furthermore, the large set of common subroutines and tools that is available via an anonymous ftp makes the code useful to new programmers in the field. We further provided a 
number of "tips" to those who designed their own computer experiments or planned to do programming on similar codes, "tips" that are hard to find in usual scientific publications. We discussed the design and typical steps taken in common molecular dynamics
 applications as well during LESs common applications that are unique to MOIL such as the LES mean field simulations and the {\b chmin} program.\par 
\pard \qj\tx2160\tx2520\tx2700 \par 
\par 
\pard \qj\tx2160\tx2520\tx2700 {\b ACKNOWLEDGMENTS\par 
}\pard \qj\tx2160\tx2520\tx2700 {\b \par 
}This research was supported by grants from the NIH and the DOE.\par 
\pard \qj\tx2160\tx2520\tx2700 RE thanks Alan Hinds for very useful discussions on code optimization. We are also grateful for the constructive suggestions and comments we received from many MOIL users.\par 
\pard \qj\tx2160\tx2520\tx2700 \par 
\par 
{\b APPENDIX 1: ANNOUNCEMENT OF MOIL RELEASE AND CONDITIONS}\par 
\pard \qj \par 
\pard \qj {\fs20  RE-RELEASE OF A PUBLIC DOMAIN MOLECULAR DYNAMICS PROGRAM\par 
}\pard \qj {\fs20 (An electronic mail message broadcasted in the computational chemistry list)\par 
}\pard \qj {\fs20 \par 
}\pard \qj {\fs20 We are announcing a new release of moil - moil.006 . moil is a public domain molecular dynamics program that is available via anonymous ftp. At the end of the file we attached in
structions of how to get the code. The code was developed by Ron Elber, Adrian Roitberg, Carlos Simmerling Haiying Li, Gennady Verkhivker, Robert Goldstein and Alex Ulitsky. Other contributors that helped in testing, developing parameter sets and writing d
ocumentation: Danuta Rojewska and Chyung Choi.\par 
}\pard \qj {\fs20 \par 
}\pard \qj {\fs20  Compare to previous releases (003 and 004) the present code has a number of bug fixes and new features. The most major addition\par 
}\pard \qj {\fs20 is the visualization program for Silicon Graphics (moil-view ,\par 
author: Carlos Simmerling). Other features include considerable\par 
speed up of most intensive computations (up to a factor of 3) and\par 
}\pard \qj {\fs20 undoubtedly a number of unknown bugs. Please note that old connectivity files will not run on 006 and it is necessary to regenerate them.\par 
}\pard \qj {\fs20 \par 
At present existing modules are:\par 
 }{\b\fs20 con}{\fs20       : creating connectivity files\par 
}\pard \qj {\fs20  }{\b\fs20 con.specl}{\fs20 : creating special connectivity files for curve crossing calculations\par 
 }{\b\fs20 con.specl1}{\fs20 : need also to run con.specl1 to get the curve crossing to work.\par 
}\pard \qj {\fs20  }{\b\fs20 energy   }{\fs20 : good old vanilla energy calculation\par 
}\pard \qj {\fs20  }{\b\fs20 dyna}{\fs20 
     : good old vanilla dynamics using velocity Verlet with RATTLE (velocity SHAKE), also possible to use periodic (cubic) boundary conditions and also possible to use LES (Locally Enhanced Sampling Protocol). We worked on optimizing the dynamics code, The
 previous version of RATTLE was improved and the use of double cutoffs (for van der\par 
}\pard \qj {\fs20 Waals and electrostatic) also speeds-up things. While more can be undoubtedly done, at present we are doing better than anyone else we are compared to.\par 
}\pard \qj {\fs20 \par 
}{\b\fs20 chain}{\fs20  or }{\b\fs20 chmin}{\fs20 : Reaction path algorithms that are based on optimization\par 
            of the complete path.\par 
            (R. Czerminski and R. Elber IJQC 24,167(1990))\par 
}{\b\fs20 ccrd }{\fs20       : Converting CooRDinates between different formats\par 
}{\b\fs20 boat }{\fs20       : provide values of BOnds Angles and Torsions. I.e.\par 
             internal coordinates for a set of structures.\par 
}{\b\fs20 rms_resd }{\fs20  : rms calculations for those who care\par 
}{\b\fs20 fluc}{\fs20        : fluctuations for those who are not at a minimum (from MD)\par 
}{\b\fs20 umbr }{\fs20       : Umbrella sampling calculations.\par 
}\pard \qj {\b\fs20 freee }{\fs20    : free energy calculations along reaction paths by perturbation or thermodynamic integration\par 
}\pard \qj {\b\fs20 mini_tn}{\fs20    : minimization using truncated Newton Raphson \par 
}{\b\fs20 mini_pwl }{\fs20  : Conjugate gradient algorithm for energy minimization.\par 
}\pard \qj {\b\fs20 puth }{\fs20     : add hydrogens to protein data bank coordinates to obtain acceptable MOIL coordinates for energy calculations. Done according to geometric criteria (no fancy stuff)\par 
}\pard \qj {\b\fs20 therm}{\fs20       : free energy calculations where the potential parameters are changing. Alchemistry.\par 
}{\b\fs20 contacts}{\fs20  : analyzing dynamics to find collisions between specific residues\par 
}\pard \qj {\b\fs20 rgyr}{\fs20      : calculate radius of gyration\par 
}\pard \qj {\b\fs20 torstat}{\fs20  : compact torsion statistics (as compared to boat), give phi.psi.chi1 values for all residues\par 
}\pard \qj {\b\fs20 mult}{\fs20     : multiply coordinates of desired residues for LES calculations.\par 
\par 
AND FINALLY:\par 
\par 
}{\b\fs20 moil-view}{\fs20   : (creator - Carlos Simmerling)\par 
 A visualization program (not modeling!) for Silicon Graphics (only).\par 
Easy to use and well interfaced with the features of moil. Dynamics of\par 
shaded spheres, sticks or mixtures. Easy recording to video tapes.\par 
Note that this program does NOT require the use of moil but can\par 
also use other file formats such as PDB.\par 
\par 
The code is in the public domain. We are giving up any right to the code.\par 
}\pard \qj {\fs20 
You may include it in your own code and distribute it further. You may transfer it "as is" to other users without asking us for permission. However, we request that this message will be kept. In case that changes are made to the code we request that a list
 of changes will be provided to the new users. We shall not be responsible for other people's bugs, we can hardly handle our own.\par 
}\pard \qj {\fs20         The code is given with the friendly understanding that you will\par 
}\pard \qj {\fs20 not try to use this code as part of a commercial software package. However, we put no restrictions on other type of products that may emerge using this code. Reference should be made to the authors of the programs\par 
as listed in the makefile to the extent that this program helps/distracts you in your work. Contact Ron Elber (ron@pap.chem.uic.edu) for proper references.\par 
}\pard \qj {\fs20 \par 
          BUGS/PROBLEMS OR OTHER TYPES OF DISCOMFORT ASSOCIATED WITH MOIL\par 
          CAN BE REPORTED TO: ron@pap.chem.uic.edu\par 
         We shall make significant effort to answer inquiries and \par 
         especially to fix bugs. Note however (as is quite clear from the\par 
         price tag on this package) that we are not making our living\par 
         selling this code. Therefore we do not promise to provide\par 
         satisfactory answers to all questions. On what is free there is\par 
         no guarantee!\par 
\par 
TO GET THE CODE:\par 
\par 
}{\b\fs20 ftp 128.248.186.70}{\fs20 \par 
}{\b\fs20 user-name:anonymous}{\fs20 \par 
}{\b\fs20 passwd: your e-mail}{\fs20 \par 
}{\b\fs20 cd dist}{\fs20 \par 
}{\b\fs20 get moil.006.tar.Z }{\fs20 (source code only)\par 
}{\b\fs20 get moil.dat.tar.Z}{\fs20  (documentation and parameters/property and monomer\par 
           \tab \tab files, in the documentation you made find useful\par 
\tab \tab \tab the file tutorial)\par 
}{\b\fs20 get moil.test.tar.Z }{\fs20 (test files - input and output - quite large and \par 
                   may be a real pain to ftp, useful also as a starting \par 
                   point to construct your own input).\par 
}{\b\fs20 get view.tar.Z}{\fs20  (the moil-view visualization program of Carlos\par 
\tab \tab \tab Simmerling\par 
}{\b\fs20 quit}{\fs20 \par 
\par 
 On your machine\par 
 }{\b\fs20 uncompress moil.006.tar.Z}{\fs20 \par 
 }{\b\fs20 tar xvf moil.006.tar                     \par 
 tar xvf moil.006.tar}{\fs20 \par 
 repeat the same procedure for the other files.\par 
\par 
 }{\b\fs20 cd moil.src.006}{\fs20 \par 
\par 
}\pard \qj {\fs20  edit the makefile and change the variables FC and FFLAGS to fit your machine.\par 
}\pard \qj {\fs20  then type "make" and you should be all set.\par 
\par 
 A good place to start is the tutorial in moil.doc, please note however\par 
 that some of the docu were prepared with the old versions, so things\par 
 may have slightly different output than you expect. Old input should\par 
 work.\par 
To compile moil-view go to the view.source directory and type make. The\par 
}\pard \qj {\fs20 machine MUST be a Silicon Graphics. Look at the file view.doc to get started.\tab \par 
}\pard \qj {\fs20 \par 
\par 
\par 
}\pard \qj {\b APPENDIX II: THE FORTRAN COMMON BLOCK THAT INCLUDES ALL THE CONNECTIVITY DATA (CONNECT.BLOCK)\par 
}\pard \qj {\b \par 
}\pard C DEFINITION OF THE MOLECULE TOPOLOGY AND POTENTIAL ENERGY TERMS\par 
\pard \par 
\par 
\pard {\fs20 c-----------------------------------------------------------------------\par 
c *** vectors of length npt (number of particles in the system)\par 
c               npt < maxpt\par 
c poimon  - integer vector with monomer pointer\par 
c               (poimon(ipt) = the # of the monomer of the ipt particle)\par 
c ptid    - integer vector with id of particle types\par 
}\pard {\fs20 c ptnm    - (char*4) name of the particle as a function of the particle number\par 
}\pard {\fs20 c ptms    - (double) mass of the particle                "\par 
c invms   - (double) 1/mass of particle, added by carlos for shake speed\par 
c ptchg   - (double) charge of particle                  "\par 
}\pard {\fs20 c epsgm6  - (double) twice square root of van der Waals energy times sigma^6\par 
c epsgm12 - (double) twice square root of van der Waals energy times sigma^12\par 
}\pard {\fs20 c ptwei   - (double) particle special weight used in LES calculations\par 
c lesid   - integer id of LES residue. If normal residue set to 0 if les\par 
c               residue all the copies under a single MULT are set to the\par 
c               same number\par 
}\pard {\fs20 c flgchr  - is a flag charge. It is set to true of the particle is charged\par 
}\pard {\fs20 c             and to false if it is not.\par 
c\par 
        integer poimon(maxpt),ptid(maxpt),lesid(maxpt)\par 
        logical flagchr(maxpt)\par 
        character*4 ptnm(maxpt)\par 
        double precision ptms(maxpt),ptchg(maxpt),epsgm6(maxpt)\par 
        double precision epsgm12(maxpt),ptwei(maxpt),invms(maxpt)\par 
c-----------------------------------------------------------------------\par 
\par 
\par 
c-----------------------------------------------------------------------\par 
c BULK - character vector with the name of BULK (molecular systems)\par 
c assembled\par 
c pbulk - pointer to the last particle of the i-th BULK\par 
c\par 
        character*4 BULK(MAXBULK)\par 
        integer pbulk(MAXBULK)\par 
c-----------------------------------------------------------------------\par 
\par 
\par 
c-----------------------------------------------------------------------\par 
c *** vector of length imono (number of monomers in the molecule)\par 
c               imono < maxmono\par 
c poipt   - From monomers to particles poipt(i) is the last particle\par 
c               of monomer i \par 
c\par 
        character*4 moname(maxmono)\par 
        integer poipt(0:maxmono)\par 
c-----------------------------------------------------------------------\par 
\par 
c-----------------------------------------------------------------------\par 
c *** vectors of length nb (number of bonds) nb < maxbond\par 
c ib1 ib2   - pointers to particles that are bonded\par 
c             ib1(k) is the pointer to particle k.\par 
c kbond     - (double) force constant for harmonic energy of bonds\par 
c req       - (double) equilibrium position for harmonic energy of bonds\par 
c imb1 imb2 - pointers to morse bonds\par 
c rmeq      - equilibrium distance for morse potential\par 
c D,alpha   - parameters of Morse potential\par 
c Arep      - parameter of repulsion energy function\par 
c Brep      - parameter of repulsion energy function\par 
c beta1      - parameter of repulsion energy function\par 
c\par 
        integer ib1(maxbond),ib2(maxbond),imb1(maxmorsb),imb2(maxmorsb)\par 
        double precision kbond(maxbond),req(maxbond),rmeq(maxmorsb)\par 
        double precision D(maxmorsb),alpha(maxmorsb),Arep\par 
        double precision beta1(maxmorsb),Brep\par 
c-----------------------------------------------------------------------   \par 
\par 
c-----------------------------------------------------------------------\par 
c *** vectors of length nangl (number of angles)\par 
c               nangl < maxangl\par 
c iangl1 iangl2 iangl3 - pointers to particles in angles\par 
c kangl   - (double) force constant for harmonic energy of angles\par 
c angleq  - (double) equilibrium position for harmonic energy of angles\par 
c\par 
        integer iangl1(maxangl),iangl2(maxangl),iangl3(maxangl)\par 
        double precision kangl(maxangl),angleq(maxangl)\par 
c-----------------------------------------------------------------------\par 
\par 
c-----------------------------------------------------------------------\par 
c *** vectors of length ntors (number of torsions)\par 
c       nunmer of torsion =< maxtors\par 
c itor1 itor2 itor3 itor4 - pointers to particles in torsions\par 
c ktors[1-3]   - (double) amplitude for torsional energy\par 
c period  - (integer) period of rotation (e.g.  2 for C=C, 3 for C-C)\par 
c phase   - (double) phase factor for torsional energy\par 
c\par 
        integer itor1(maxtors),itor2(maxtors),itor3(maxtors)\par 
        integer itor4(maxtors)\par 
        integer period(maxtors)   \par 
        double precision ktors1(maxtors),ktors2(maxtors)\par 
        double precision ktors3(maxtors),phase(maxtors)\par 
c-----------------------------------------------------------------------\par 
\par 
\par 
c-----------------------------------------------------------------------\par 
c *** vectors of length nimp (number of improper torsions)\par 
c               nimp =< maximp\par 
c iimp1 iimp2 iimp3 iimp4 - pointers to particles in improper torsions\par 
c kimp    - (double) force constant for all improper torsions\par 
c impeq   - (double) equilibrium position for improper torsion\par 
c\par 
        integer iimp1(maximp),iimp2(maximp),iimp3(maximp),iimp4(maximp)\par 
        double precision kimp(maximp)\par 
        double precision impeq(maximp)\par 
c-----------------------------------------------------------------------\par 
\par 
c----------------------------------------------------------------------\par 
c Exclusion lists between 1-2 1-3 and  aspecial list for 1-4.\par 
c exc1 is a pointer to exc2, i.e. exc2(exc1(i)) is the last\par 
c particles to be excluded from interaction with i\par 
c Similarly spec1 is a pointer to spec2  \par 
c\par 
        integer exc1(0:maxpt),exc2(maxex)\par 
        integer exc31(0:maxpt),exc32(maxex)\par 
        integer spec1(0:maxpt),spec2(maxspec)\par 
        integer spec31(0:maxpt),spec32(maxspec)\par 
c----------------------------------------------------------------------\par 
\par 
c----------------------------------------------------------------------\par 
c *** integers define actual size of molecule, initially are set to 0\par 
c totmon  - total number of monomers\par 
c npt     - total number of particles\par 
c nb      - total number of bonds\par 
c nmb     - total numberof morse bonds\par 
c nangl   - total number of angles\par 
c ntors   - total number of torsion\par 
c nimp    - total number of improper torsions\par 
c lestyp  - total number of different types of LES monomers\par 
c totex   - number of exclusions of particle interactions\par 
c totspe  - number of special (1-4) interactions\par 
c nexc    - number of exclusions of particle interactions\par 
c nspec   - number of special (1-4) interactions\par 
c NBULK   - number of "bulk" (molecular systems)\par 
c nchg    - the number of charged particles in the molecule(s\par 
c *** lesflag - if true, les particles are present\par 
c\par 
        integer totmon,npt,nb,nmb,nangl,ntors,nimp,lestyp,NBULK\par 
        integer totex,totspe,nchg\par 
        integer totex3,totspe3\par 
        logical lesflag\par 
\par 
\par 
\par 
c-----------------------------------------------------------------------\par 
c Now gather everything into common blocks\par 
c\par 
        common /CONNLOG/lesflag,flagchr\par 
        common /CONNINT/poimon,lesid,ptid,pbulk,poipt,ib1,ib2,iangl1,\par 
     1          iangl2,iangl3,itor1,itor2,itor3,itor4,period,\par 
     2          iimp1,iimp2,iimp3,iimp4,exc1,exc2,spec1,spec2,\par 
     3          exc31,exc32,spec31,spec32,\par 
     4          totmon,npt,nb,nangl,ntors,nimp,lestyp,NBULK,\par 
     5          totex,totspe,totex3,totspe3,imb1,imb2,nmb,nchg\par 
        common /CONNDBL/ptms,invms,ptchg,epsgm6,epsgm12,ptwei, \par 
     1        kbond,req,kangl,angleq,ktors1,ktors2,ktors3,phase,\par 
     2          kimp,impeq,rmeq,D,alpha,Arep,beta1,Brep\par 
        common /CONNCHR/ptnm,moname,BULK  \par 
\par 
\par 
\par 
\par 
\par 
\par 
}\pard \qj {\b REFERENCES}:\par 
\par 
\pard \qj {\fs20 1. U. Burkert and N.L. Allinger, "Molecular Mechanics", ACS, Washington D.C., 1982\par 
}\pard \qj {\fs20 2. S.J. Weiner, P.A. Kollman, D.A. Case, U.C. Singh, C.Ghio, G. Alagons, S. Profeta Jr. and P. Weiner, "A new force field for molecular mechanical simulation of nucleic acids and proteins", J. Amer. Chem. Soc. 106:765(1984)\par 
}\pard \qj {\fs20 3. B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan and M. Karplus, "CHARMM: A program for macromolecular energy, minimization, and dynamics calculations", J. of Comput. Chem. 4:187(1983)\par 
}\pard \qj {\fs20 4. Groningen Molecular Simulation program manual, version GROM87, BIOMOS B.V. Groningen, The Netherland\par 
}\pard \qj {\fs20 5. R. Czerminski and R. Elber, "Self avoiding walk between two fixed points as a tool  to calculate reaction paths in large molecular systems", Int.J.Quant.Chem. andthe Proc. of Sanibel Symposia 24:167(1990)\par 
}\pard \qj {\fs20 6. A. Roitberg and R. Elber, "Modeling side chains in peptides and proteins: Application of the Locally Enhanced Sampling (LES) and the simulated annealing methods to find minimum energy conformations", J. Chem. Phys. 95,9277(1991) 
\par 
}\pard \qj {\fs20 7. W.L. Jorgensen and J. Tirado-Rives, "The OPLS potential function for proteins. Energy minimizations for crystals of cyclic peptides and crambin", J. Amer. Chem. Soc. 110:1657(1988)\par 
}\pard \qj {\fs20 8. E.E. Nikitin, "Theory of elementary atomic and molecular processes in gases", (Clarendon, Oxford, 1974)\par 
9. H. Li, R. Elber and J.E. Straub, "Molecular dynamics simulation of NO recombination to myoglobin mutants", J. Biol. Chem., in press \par 
10. H.C. Anderson, "Rattle: A velocity version of the SHAKE algorithm for molecular dynamics simulations", J. Comput. Physics, 52:24(1983)\par 
}\pard \qj {\fs20 11. A. Ulitsky and R. Elber, "The equilibrium of the time dependent Hartree and the Locally Enhanced Sampling approximations: Formal properties, corrections and computational examples of rare gas clusters", J. Chem. Phys. 98:3380(1993)
\par 
12. R. Elber and M. Karplus, "Enhanced sampling in molecular dynamics: Use of the Time-Dependent Hartree approximation for a simulation of carbon monoxide diffusion through myoglobin", J. Amer. Chem. Soc., 112:9161(1990)\par 
}\pard \qj {\fs20 13. Chen Keaser, unpublished results.\par 
}\pard \qj {\fs20 14. R. Czerminski and R. Elber, "Computational studies of ligand diffusion in globins: I. leghemoglobin", Proteins 10:70(1991) \par 
}\pard \qj {\fs20 15. Q.H. Gibson, R. Regan, R. Elber, J.S. Olson and T.E. Carver, "Distal pocket residues affect picosecond recombination in myoglobin: An experimental and molecular dynamics study of position 29 mutants", J. Biol. Chem. 267,22022(1992)
\par 
16. G. Verkhivker, R. Elber and W. Nowak, "Locally enhanced sampling in free energy calculations: Application of mean field approximation to accurate calculations of free energy differences", J. Chem. Phys. 97:7838(1992)\par 
}\pard \qj {\fs20 17. R. Czerminski and R. Elber, "Reaction path study of conformational transitions in flexible systems: Applications to peptides", J. Chem. Phys. 92:5580(1990)\par 
}\pard \qj {\fs20 18. J. Barker, "An algorithm for the location of transition states", J. Comput. Chem. 7:385(1986)\par 
}\pard \qj {\fs20 19. G. Verkhivker, R. Elber and Q.H. Gibson, "Microscopic modeling of ligand diffusion through the protein leghemoglobin: Computer simulations and experiments", J. Amer. Chem. Soc. 114:7866(1992)\par 
}\pard \qj {\fs20 20. R. Elber, "Calculations of the potential of mean force using molecular dynamics with linear constraints: An application to a conformational transition of a solvated dipeptide", J. Chem. Phys. 93:4312(1990)\par 
}\pard \qj {\fs20 21. R.H.J.M. Otten and L.P.P. van Ginneken, "The annealing algorithm", Kluwer Academic, Boston, 1989\par 
}\pard \qj {\fs20 22. W. Nowak, R. Czerminski and R. Elber, "R
eaction path study of ligand diffusion in proteins: Application of the SPW method to calculate reaction coordinate for the motion of the CO through leghemoglobin", J. Amer. CHem. Soc. 113:5627(1991)\par 
}\pard \qj {\fs20 23. A. Roitberg and R. Elber, "The locally enhanced sampling for side chain modeling", a chapter in "Protein Structure Determination", Ed. K. Merz and S. Le Grand, Ed., Springer Verlag, in press.\par 
}\pard {\fs20 \par 
}}
Modified: Thu Mar 31 17:00:00 1994 GMT
Page accessed 1843 times since Sat Apr 17 22:01:15 1999 GMT