CCL Home Preclinical Pharmacokinetics Service
APREDICA -- Preclinical Service: ADME, Toxicity, Pharmacokinetics
Up Directory CCL April 28, 1994 [018]
Previous Message Month index Next day

From:  ""Eve Zoebisch"" <ZOEBISCH(-(at)-)CRVAX.SRI.COM>
Date:  Thu, 28 Apr 94 13:42:39 PDT
Subject:  Semi-empirical methods revisited



Semi-empirical methods revisited.

In the discussion about semi-empirical methodology, parameterization, methods
used in the past, and the need for improved methods,   Holder espouses the
values of one general method for a given level of approximation whereas Cramer,
Paneth and others would appreciate a suite of methods optimized for different
problems.  As a method developer I find I agree with both viewpoints since,  in
practice,  both are important as this long winded argument will attempt to
show.

In the development of AM1,  one of the first approaches tried was to remove the
dipole moments from the basis set in the expectation that the heats of
formation would improve.  We were very surprised to see negligible improvement
in heats of formation whereas the dipole moments no longer made chemical sense.
It was clear why this was the case.  Assume the dipole moments are irrelevant
and randomly assign charges to  the atoms (i.e. randomly assign occupation of
the atomic orbitals).  Clearly these charges would not produce a good
description of the electronic structure of the molecule and the heats of
formation which are calculated from the density matrix would be poor.  In order
to provide a good estimate of the energy it is necessary to have an optimal
description of the electronic distribution.  The most direct and available
experimental evidence of electronic distribution was dipole moment and
ionization potentials.  Other evidence routinely used included the angle in
biphenyl,  the angle of :CH2 and the relative bond lengths in acetylene.

Examination of calculated results compared to experimental data indicated that
correlation energy was both important and handled reasonably well within the
formalism used.  A first order argument of how correlation energy is
incorporated was developed in  87.  I am glad to see Martin has developed a
more rigorous derivation.  Given that correlation energy was not the largest
source of error,  it should be possible to use experimental data to deduce a
description of the density matrix.  Approximately 2/3 of the time spent
developing AM1 parameters was devoted to reproduce data indicative of
appropriate electron density distribution.  This excessive attention was
intended to produce the best possible correlated electron density.
Unfortunately the effort was hindered by the approximations in the method.  In
particular the lack of 3 and 4 center terms (terms describing bond-bond
repulsion) and the resulting errors made the effort difficult.  At this point
it is not possible to know how successful this endeavor was.  Literature on ab-
initio calculations indicate a high level of correlation is needed before one
reproduces the relevant experimental data.  I have been hesitant to discuss
this aspect until there are appropriate first principle calculations with high
level correlation to prove or disprove the appropriate description of electron
density and correlation effects.

During this stage all data was relevant and provided a clue as to where the
model was correct and where it was wrong.  The routine day was to minimize the
error function in parameter space,  compare the resulting parameters with
previous sets,  deduce how the changes in parameters would change the
electronic structure of molecules and propose experimental data which could be
found in the library which would clearly demonstrate the strengths and
weaknesses of each parameter set.  The resulting information was used to
propose a new search direction in  parameter space.  At this stage different
parameters sets appeared to be different minima in the error surface.  The
intent was to determine which minima (which sets of parameters) produce models
which most closely resembled chemistry.  By far the most difficult part of this
procedure was to propose experimental data which would clearly show the
differences between two sets of parameters,  a task which depends predominantly
on chemical intuition.  Every method developer using an approach which is not
an exact solution of the Schroedinger Equation faces this task independent of
the speed of the computer,  the size of the basis set,  or the search algorithm
used.

Up to this point I agree with Andy Holder that a method which is general is
important.  Any data which aids the search and characterization of the error
surface is valuable and systems which are in error are an important indicator
of where the method is weak.  Indeed it was common for a correction implemented
to improve one error to improve results for numerous other, seemingly
unrelated, systems.  One would expect this to be true if the error function is
non-linear (resulting in multiple minima) and the method is, to an
approximation, correct as Martins work indicates is the case.  As a result, the
insistence on generality improves the method for all problems as expected for
equations which are approximately correct.  However as one considers the errors
in the method the situation changes.  For the optimal sets of parameters
mentioned above,  different sets represent different distribution of errors.
Once the error space is characterized,  pragmatic concerns prevail and a
parameter set is chosen which minimizes the different errors.  It is
inappropriate to state that the parameters on, for example, nitrogen are wrong.
The parameters chosen are those which reproduced experimental data and it is
not possible for them to be wrong.  However there may be a difference in
opinion on which data is important and particular systems (e.g. nitrogens
involved in pi bonding) may have been over emphasized.  Personally I would have
preferred to maintain a model with an optimal description of the electron
density.  However most users are interested in heats of formation and the
correct angle in  biphenyl,  among other features,  was lost when the method
was optimized for heats of formation.

With a few notable exceptions,  at the time the method was released I was aware
of all the errors in AM1 which have crossed my desk since.  One of the errors I
was not aware of was the interaction between oxygens in water.  J. Stewart
found this bug and it was thought to have been corrected but apparently it
wasnÕt.  With these few exceptions, the errors in AM1 were a compromise and an
improvement in one area resulted in worse results for another.  IÕm not
convinced new data and larger molecules would result in a significant
improvement.  It may be preferable to develop a better (albeit more expensive)
approximation such as MNDO/D and SAM1.  A better method would lower the errors
which are responsible for the lack of generality of a method.

Disregarding the above,  there is another issue at stake.  There is a minor
inconsistency in the equations used in AM1 that was found in  87.  I have long
suspected that correction of this would produce a large enough improvement to
justify a cleaner version of the method.  I am currently a user and,  as a
user, I recognize that generating a suite of programs for use in different
problems has merit within the concessions to generality mentioned above about
the characterization of the error surface.  While the 10 year gap does not
permit me to remember all the trade-offs made in AM1,  I believe the following
suite of parameter sets to be possible:

-an improved handling of -NO2 groups and heteroatoms involved in pi systems
(amide bond) with laughable results for bulk water where there is a strong
attraction between oxygen atoms.
-improved heats of formation of closed shell, ground state systems with
vanishing activation barriers for some radical reactions (indicative of a
problem for all reactions)
-improved description of electronic properties (spectra, angles, etc.)  at the
expense of heats of formation.
-improved bulk water with loss of -NO2 groups.  The amide bond was not tested
in this minimum and the effect on the amide bond is not known..

Note:  In my recollection,  all reasonable sets of parameters gave rotational
barriers which were low and the set used was one of the best in this regard.
Any significant change in the method would probably have a negative impact on
transition states.

I would prefer not to see a proliferation of groups developing parameters for
specific experiments which one sees in force field work.  In MM the correlation
between the geometry and the equations is straight forward with few
ambiguities.  Yet there are still problems with reproducibility from one group
to the next and the source of parameters in published work is often not known
(often obtained from a friend working with similar functional groups).  With
methods as ambiguous as semi-empirical functions the problem would be greatly
amplified.  A small change in one part of a molecule has large effects on
totally unrelated systems as seen with -NO2 groups and water mentioned above.
As a result an empirical method should be fully characterized (characterization
is more difficult in semi-empirical work than in MM) before it can be used in
published work.  If the suite mentioned above is to be generated,  it should be
a group effort with common standards of generality,  where excessive amounts of
chemical intuition is used to ascertain the strengths and weaknesses of the
method and these strengths and weaknesses are clearly stated and published
before the methods are used.

When I have considered all aspects,  the effort required for careful method
development and characterization,  the ambiguity of correct incorporation of
correlation energy,  the need for more accurate methods,  and the likelyhood of
obtaining that accuracy within the approximations made in the NDDO
approximation,  I have generally considered it to be of greater utility to
create new methods and not rehash old approximations.


E. G. Zoebisch
eve;at;amethyst.sri.com




Similar Messages
08/01/1996:  Re: CCL:M:Heat of formation calculation using MOPAC.
03/02/1992:  Oh, boy, oh, boy!  Real scientific controversy!
05/08/1995:  AM1 vs. PM3
11/22/1997:  EXTENDED HUECKEL--MORE INFO AND A FINAL SUMMARY
04/12/1994:  AM1 vs. PM3
12/16/1994:  Spin contamination & AM1 "ROHF" versus UHF
11/02/1995:  summary AM1 vs PM3
08/01/1995:  Spin contamination, effect on energy and structure.
08/03/1995:  ACS Chicago - CINF Abstracts    - 29 pages document -
04/18/1994:  Semiempirical parameterization yet again...


Raw Message Text