http://server.ccl.net/cca/documents/molecular-modeling/node9.shtml |
![]() |
CCL node9 |
![]() ![]() ![]() Next: Molecular Comparisons Up: Molecular Modeling Previous: Molecular Mechanics
Molecular Dynamics and Monte CarloThe true picture of a molecular system is far from the static, idealized image provided by molecular mechanics. The atoms and groups in molecules are in constant motion, and, especially in the case of biological macromolecules, these movements are concerted (correlated) and may be essential for biological function. The direct proof of this behavior is provided by X-ray and NMR experiments. Thermodynamic properties of molecules, especially when they form a complex or are immersed in a solvent, can not be derived from the harmonic approximation which inherently assumes small amplitude motions around a symmetric minimum. While molecular mechanics and the simulation methods described below share the same potential energy function they differ conceptually. In molecular mechanics, the information is derived from a single geometry of the molecule. Simulation methods require thousands to millions of geometries to produce meaningful results. The biological action of molecules frequently involves large amplitude motions (biochemists aften call them ``conformational transitions'' or ``changes''), resulting in abrupt changes in the geometry of the protein/ligand molecule as well as rearrangement of solvent molecules. The thermodynamic and energetic parameters of such systems can only be derived by realistic and explicit simulation, i.e., by leaving the potential energy minimum and probing the energy surface of the molecular system. Molecular simulations are a recent addition to molecular modeling systems and are still in active stages of development. They are also more complex conceptually than molecular mechanics. Recent monographs on this topic contain not only the physical and mathematical background necessary to use the method but also point to applications of the methods: Brooks et. al, (1988); McCammon & Harvey (1987), van Gunsteren & Weiner (1989). There are essentially two approaches to performing molecular simulations: the stochastic and the deterministic. The stochastic approach, called Monte Carlo, is based on exploring the energy surface by randomly probing the geometry of the molecular system (or, in a statistical mechanics language, its configuration space). It is essentially composed of the following steps:
The most popular realization of the Monte Carlo method (though not the only one !) for molecular systems is the Metropolis method:
As a result of a stochastic simulation, the large number of configurations (geometries) are accumulated and the potential energy function is calculated for each of them. These data are used to calculate thermodynamic properties of the system. The basic statistical mechanics formulae will be given later in this section. The Monte Carlo method is accepted more by physicists than by chemists, probably because MC is not a deterministic method and does not offer time evolution of the system in a form suitable for viewing. It does not mean however, that for deriving the thermodynamic properties of systems molecular dynamics is better. In fact many chemical problems in statistical mechanics are approached more efficiently with MC, and some (e.g., simulations of polymers chains on a lattice) can only be done efficiently with MC. Also, for Markov chains, there are efficient methods for deriving time related quantities such as relaxation times. Currently, the stronghold of MC in chemistry is in the area of simulations of liquids and solvation processes.
The deterministic approach,
called Molecular Dynamics (MD), actually simulates the time evolution of the
molecular system and provides us with the actual trajectory of the
system
In molecular dynamics, the
evolution of the molecular system
is studied as a series of snapshots taken at close time intervals
(usually of the order of femtoseconds, 1 fs = 10
Based on
the potential energy function V, we can find components,
This force results in an acceleration By knowing acceleration, we can calculate the velocity of an atom in the next time step. From atom positions, velocities, and accelerations at any moment in time, we can calculate atom positions and velocities at the next time step. Integrating these infinitesimal steps yields the trajectory of the system for any desired time range. There are efficient methods for integrating these elementary steps with Verlet and leapfrog algorithms being the most commonly used. For the details of these algorithms, interested readers can consult Hermans (1985) and the references therein.
To start the MD simulation we need an initial
set of atom positions (i.e., geometry) and atom velocities.
In practice, the acceptable starting state
of the system is achieved by ``equilibration'' and ``heating'' runs prior
to the ``production'' run.
The initial positions of atoms are most often accepted from the prior geometry
optimization with molecular mechanics. Formally, such positions correspond
to the absolute zero temperature. The velocities are assigned randomly
to each atom from the Maxwell distribution for some low temperature (say
20
where N denotes the number of atoms in the system, Molecular dynamics for larger molecules or systems in which solvent molecules are explicitly taken into account, is a computationally intensive task even for the most powerful supercomputers, and approximations are frequently made. The most popular is the SHAKE method which in effect freezes vibrations along covalent bonds. This method is also applied sometimes to valence angles. The major advantage of this method is not the removal of a number of degrees of freedom (i.e., independent variables) from the system, but the elimination of high frequency vibrations corresponding to ``hard'' bond stretching interactions. In simulations of biological molecules, these modes are usually of least interest, but their extraction allows us to increase the size of the time step, and in effect achieve a longer time range for simulations.
Another approximation
is the united-atom approach where hydrogen atoms which do not
participate in hydrogen bonding are lumped
into a heavier atom to form a pseudo-atom of larger size and mass (e.g.,
a CH
Even supercomputers have their limitations and
there is always some practical limit on the
size (i.e., number of atoms) of the simulated system.
For situations involving solvent, the small volume
of the box in which the macromolecule and solvent are contained introduces
undesirable boundary effects.
In fact, the results may depend sometimes more on the size
and shape of the box than on the molecules involved.
To circumvent this limited box size difficulty, periodic
boundary conditions are used. This
idea is represented in Fig. 6.24.
In this approach, the original box containing a solute
and solvent molecules is surrounded with identical images of itself,
i.e., the positions and velocities of corresponding particles in all of the
boxes are identical.
The common approach is to use a cubic or rectangular parallelepiped box,
but other shapes are also possible (e.g., truncated octahedron).
By using this
approach, we obtain what is in effect an infinite sized system. The particle
(usually a solvent molecule) which escapes the box on the right side, enters
it on the left side, due to periodicity. Since MD simulations are usually
performed as an NVE (microcanonical) ensemble (i.e., at constant number of
particles, constant volume, and constant total energy) or an NVT (canonical)
ensemble, the volume of the
boxes does not change during simulation, and the constancy in the number
of particles is enforced by the periodicity of the lattice, e.g., a particle
leaving the box on left side, enters it on the right side. There are
also techniques for performing simulations in a NPT (isothermal-isobaric),
and NPH (isobaric-isoenthalpic) ensembles, where the pressure constancy during
simulation is achieved by squeezing or expanding box sizes. The constant
temperature is usually maintained by ``coupling the system to a heat
bath'', i.e., by adding dissipative forces (usually
Langevine, friction type forces) to the atoms of the system which as
a consequence affects their velocities.
However, each approximation has its price. In the case of periodic
boundary conditions we are actually
simulating a crystal comprised of boxes with ideally correlated atom
movements. Longer simulations will be contaminated with these artificially
correlated motions. The maximum length for the simulation, before artifacts
start to show up, can be estimated by
considering the speed of sound in water (
In the other popular approach, stochastic boundary conditions allow us to reduce the size of the system by partitioning the system into essentially two portions: a reaction zone and a reservoir region. The reaction zone is that portion of the system which we want to study and the reservoir region contains the portion which is inert and uninteresting. For example, in the case of an enzyme, the reaction zone should include the proximity of the active center, i.e., portions of protein, substrate, and molecules of solvent adjacent to the active center. The reservoir region is excluded from molecular dynamics calculations and is replaced by random forces whose mean corresponds to the temperature and pressure in the system. The reaction zone is then subdivided into a reaction region and a buffer region. The stochastic forces are only applied to atoms of the buffer region, in other words, the buffer region acts as a ``heat buffer''. There are several other approximations whose description can be found in the molecular dynamics monographs quoted here. The straightforward result of a molecular dynamics simulation, a movie showing changing atom positions as a function of time, contains a wealth of information in itself. Viewing it may shed light on the molecular mechanisms behind the biological function of the system under study.
Molecular dynamics,
contrary to energy minimization in molecular mechanics,
is able to climb over small energy barriers and can drive the system
towards a deeper energy minimum. The use of molecular dynamics allows
probing of the potential energy surface for deeper minima in the vicinity of
the starting geometry. It
is exploited in a simulated annealing method, where the molecular system
is first heated to an artificially high temperature and then snapshots
of the trajectory at high temperature are taken as a starting state for cooling
runs. The cooled states obtained from hot frames correspond frequently to
much deeper potential energy minima than the original structure taken
for dynamics. It is particularly suitable for imposing geometrical constraints
on the system. Such constraints may be available from experimental results.
For example, the NOE Since molecular dynamics is a very active field of research, new approaches are reported frequently. To facilitate the incorporation of new techniques and approaches, many molecular dynamics software packages are organized in a modular fashion. Modules communicate their data and results via disk files. In this way, the versatility and flexibility of the software is achieved, and new modules can be inserted easily between the old ones without the need of modifying existing computer code. Statistical mechanics concepts can supply even more information from molecular dynamics runs, since they furnish prescriptions for how to calculate the macroscopic properties of the system (e.g., thermodynamic functions like free energy and entropy changes, heats of reaction, rate constant, etc.) from statistical averages of elementary steps on molecular trajectory. They also allow us to uncover a correlation between the motions of groups or fragments of the system as well as provide various distribution functions. Essentialy, all expressions derived below apply to both MD and MC, with the only difference being that the ``time step'' in MD should be changed to a ``new configuration'' for MC. MD was chosen here for illustration, since it is more often applied to chemical problems. Three quantities form a foundation for these calculations: the partition function Z, the Boltzmann probability functional P, and the ensemble average, <A>, of a quantity A:
where
The above equation simplifies significantly when cartesian coordinates
and cartesian momenta
are used, since the kinetic energy factor of the total energy represented
by the hamiltonian
where the potential energy function
The situation is more optimistic with differences in the free energy
between two closely related states, since it can be assumed
that contributions from points far apart in the configurational space will
be similar, and hence will cancel each other in this case.
The free energy difference,
By multiplying the numerator of the fraction the ratio becomes: or, after including the integral from the denominator into the integral in the numerator:
i.e., the ensemble average of the exponential function involving the
difference between potential energy functions for state 0 and 1 calculated
at coordinates obtained for the state 0. To clarify the importance of this
statement, let us assume that we have two closely related states which
are described by potential functions
The above formula is given here only for illustration, since in practical
computations this integration is done differently for the sake of efficiency.
As was mentioned earlier, the difference between states 0 and 1 should be
very small (not more than
where
i.e., the coordinates calculated at the substate
The most important conclusion from the above is that the conversion between
two states does not have to be conducted along a physically meaningful
path. Obviously, atoms cannot be converted stepwise to one another, since
they are made of elementary particles. However, basic thermodynamics,
namely, the Hess's law of constant heat summation, ensures that the difference
between free energy depends only upon initial and final states.
Any continuous path
which starts at state 0 and ends at state 1,
even if it is a completely fictitious one, can be used to calculate this value.
There are other methods than perturbation of free energy to calculate the
difference in free energies between states. The most accepted one is
the thermodynamic integration method where the derivative
of free energy A versus coupling parameter It can be proven that:
in other words, the derivative of the free energy versus
In most cases, the calculations are run in both directions, i.e.,
The methods above allow one to calculate in principle the free energies of
solvation, binding, etc. In the case of free energies of solvation, for
example, one solvent molecule is converted to a solute
molecule in a stepwise fashion,
by creating and annihilating atoms or changing their type
as illustrated by eq. 6.52, which in effect is creating
the solute molecule de novo. Binding energies could in principle be
calculated in a similar way by ``creating'' the ligand molecule
in the enzyme cavity. In practice, these calculations are very difficult,
since they involve a massive displacement of solvent, i.e., the initial
and final state differ dramatically. However, in drug design we are in most
cases
interested not in the actual free energies of solvation or binding, but
in differences between these parameters for different molecules, i.e.,
in values of The thermodynamic cycle for this processes is represented in Fig. 6.26.
The difference between free energies of binding for molecule
Since eq. 6.50 requires that the potential energy function for
state 1 is calculated at the coordinates of state 0, the number of atoms
and bonds in both molecules must be equal.
The processes
A molecular dynamics simulation is first performed to calculate
It cannot be stressed strongly enough that both of these free energy changes,
In these instances where we know the structure of the receptor protein, the thermodynamic cycle method is a very promising tool for drug design, since it allows us to actually calculate the differences between free energies of binding for molecules which have not even been synthetized yet! Likewise, it allows assessing the impact of variations in protein structure (e.g., mutant versus wild type protein) on the binding, i.e., perform genetic engineering on the computer. This is why the X-ray crystallography or NMR studies of proteins combined with molecular simulation techniques form a framework for the drug design of the future, where prospective drug molecules will be first inferred from theory, and only later synthetized. Morerover, intensive research on deducing protein tertiary structure from the knowledge of its primary sequence is bound to result in new possibilities for folding of proteins in the computer. With the astonishing advances in the power of computers and progress in methodology, there is no doubt that the importance of these techniques to rational drug design will increase steadily.
![]() ![]() ![]() Next: Molecular Comparisons Up: Molecular Modeling Previous: Molecular Mechanics Computational Chemistry Wed Dec 4 17:47:07 EST 1996 |
[ CCL Home Page ]
[ About CCL ]
[ Resources ]
[ Search CCL ]
[ Announcements ]
[ Links ]
[ E-mail us ]
[ Raw Version of this page ]
Modified: Sat May 23 16:00:00 1998 GMT |
Page accessed 11847 times since Tue Apr 20 17:23:34 1999 GMT |