ab initio (Gaussian) optimization summary
Dear Netters,
The following is a summary of the responses to a question
posted on the CCL several weeks ago.
I have also placed some comments of my own
in this summary. If anyone has any additional comments they
can contact me directly. I haven't really answered all the
questions to my satisfaction but I must move on this other issues.
********************************************************************
The original question:
Hello CCL,
I am trying to reduce the ab initio (e.g. rhf/6-31g*) geometry
optimization time in Gaussian 94. Gaussian 94 uses redundant
internal coordinates by default so the set up of the z-matrix
is not so much of an issue. My question (which I have also
posted to info #*at*# gaussian.com) is whether adjustable SCF convergence
criteria exist for geometry optimization in general. My question to the
list applies to all quantum mechanical geometry optimization.
It seems to me that a precise SCF convergence is wasteful of time
in the initial stages of geometry optimization when the geometry
is relatively crude, since the hamiltonian may have to be
significantly altered in succeeding geometry optimization steps.
It would seem that some changing value such as one-tenth the
energy between steps in the geometry optimization would make
more efficient use of the SCF cycles.
If anyone has some more experience in this matter or would like
to responde to flaws to my logic, please respond to me and I will
post a summary to the list.
Thanks for your assistance.
Stephen Little sbl #*at*# linus.herl.epa.gov
****************************************************************
The responses:
****************************************************************
Hi Stephen,
The problem with such a scheme is that the gradients become much less
accurate. Modern optimization algorithms heavily rely on high accuracy
of both energy and gradients for the Hessian updating scheme. Using
poor data in the early stages would mean deterioration or, at best, no
improvement, of the Hessian. The initial Hessian is being estimated or
may be calculated (CalcFC); it definitely needs the improvement during
the optimization process. Omitting the update, or updating with data of
low accuracy, significantly slows down the optimization process.
So, it is something like "attempting to avoid Scylla, they got into the
Charybdis" (I don't remember the Greek original).
Best wishes,
Nico van Eikema Hommes
--
Dr. N.J.R. van Eikema Hommes Computer-Chemie-Centrum
hommes #*at*# ccc.uni-erlangen.de Universitaet Erlangen-Nuernberg
Phone: +49-(0)9131-856532 Naegelsbachstr. 25
FAX: +49-(0)9131-856566 D-91052 Erlangen, Germany
*****************************************************************
Greetings...
I have had several discussions with both Frisch and Schlegel on this
issue.
I find the following scheme to be of significant robustness that I
use it all the time:
For organic molecules I run the following job stream: (the existence
of a checkpoint file in the proper places in this compound job is
assumed.) The loose option multiplies all convergence criteria by
a factor of 5.
#p rpm3 opt iop(4/20=8) !note: pm3 force constants at the pm3
geometry help!
stuff...
--LINK1--
#p hf/3-21g opt=(mndofc,loose) iop(4/20=8) geom=check !pm3
force constants!
stuff...
--LINK1--
#p hf/6-31G* opt=(readfc,loose) geom=check guess=read
stuff...
--LINK1--
#p hf/6-31G* opt=readfc geom=check guess=read
stuff...
Regards,
John
John M. McKelvey email: mckelvey #*at*# Kodak.COM
Computational Science Laboratory phone: (716) 477-3335
2nd Floor, Bldg 83, RL
Eastman Kodak Company =
Rochester, NY 14650-2216
**************************************************************
Date: 3 Aug 95 8:01 +0200
From: Frank Jensen <frj #*at*# dou.dk>
To: sbl #*at*# linus.herl.epa.gov
Subject: geom opt
Stephan,
In principle you are correct, it is wastefull to
obtain a tightly converged SCF wave function if the geometry
is far from a stationary point. So one should in principle
use an SCF convergence criteria dependent on the size of
the gradient. There are, however, two factors which makes
this less useful.
1) The final SCF cycles are inexpensive. In a
conventional SCF, the final 2-3 SCF iterations for converging
from say 10(-6) to 10(-8) takes very little time. Direct
SCF methods are slightly more expensive, but again compared
to the constant time required for the first say 10 SCF
iterations and the gradient contribution, the saving of
a few SCF cycles is marginal.
2) If your starting geometry has a large gradient,
you are not doing the geometry optimization the most
efficient way. One would normally optimize geometries
starting from optimized geometries at a lower level,
i.e. starting ab initio optimizations from geometries
obtained with semi-empirical or force field methods.
Optimizations at higher ab initio level should be
preceeded by optimizations at lower levels, like perhaps
HF/STO-3G or 3-21G or similar small basis sets.
The high level methods, where the saving by you suggestion
would be largest, is normally only done to make the
final refinements on the geometry, where accurate gradients
and therefore also accurate wave functions are needed.
Frank Jensen
***************************************************************
The crucial issue is obtaining a sufficiently precise hessian for
optimizations. To my knowledge, all current QM software use an approximate
hessian for optimizations and a more exact one for computing frequencies.
At this stage, the degree of precision is probably fairly well tuned and I
wouldn't expect much of a speedup by varying this.
In terms of speeding up optimizations, you should consider two key issues.
One is minimizing the number of steps required for the optimization and the
other is running your system under optimum conditions.
To minimize the number of steps, get better starting geometries and use a
molecular mechanics or semiempirical preminimizer for ground-state
geometries. Using starting geometry optimized at lower ab intio level is
also helpful if you happen to have it but I find little benefit in starting
from say a 3-21G geometry verses molecular mechanics or semiempirical
optimization if you're heading for 6-31G* or better. However, if your
interested in transition-states this approach may not, and in my experience
does not, work well and you'll have to report to other methods which I
won't go into.
Choice of optimization or search algorithm can also have a significant
impact on the number of steps required to achieve optimization. As I
understnd it, Gaussian claims that use of redundant internal coordinates
typically converges on the optimum more quickly and smoothly thereby
reducing time. Besides taking fewer steps, redundant internal coordinates
are also supposed to be better at avoiding or overcoming situations which
have difficulty converging, i.e., fail to identify a minimum. Others may
make similar claims of one algorithm verses another with a select set of
compounds but there is no generally recognized validation set and one is
apt to experience situations where one method can be shown to converge in
fewer steps than another. Such "discussions" appear not only in QM
but
also molecualr mechanics methods and I read them with interest but also
some scepticism and a great deal of faith that things will only get better.
Aside from the obvious CPU upgrade or acquiring a newer, faster system, you
can also speed up calculations by checking your system configuration
against your software needs for your a typical calculation. Some simple
checks and improvements (most important first):
- If you have inordinate swapping, get more RAM.
- If I/O is consistently >20% of your elapsed job time, improve I/O. one
inexpensive method is to stripe acrosss multiple disks on multiple SCSI
channels. Depending on your system and budget, you might be able to
keep I/O =<10% of the total run time (system running single job).
- With Gaussian, and possibly some other codes, you may want to adjust the
amount of RAM allocated for a job since this which affects whether
different parts are computed by in-core, direct or semi-direct methods
which in turn impacts CPU and I/O load. The various methods can also be
expected to give different parallel response. Unfortunately, there is
no systematic study and performance may depend on general system
configuartion. Generally, direct SCF will give the best overall
performance but you may wish to check your system with a "typical"
computation to identify your preferred options.
Regards, Karl
_______________________________________________________________________
/ \
| Comments are those of the author and not Unilever Research U. S. |
| |
| Karl F. Moschner, Ph. D. |
| |
| Unilever Research U. S. e-mail: Karl.F.Moschner #*at*# urlus.sprint.com |
| 45 River Road Phone: (201) 943-7100 x2629 |
| Edgewater, NJ 07020 FAX: (201) 943-5653 |
\_______________________________________________________________________/
*******************************************************************
I saw your post on ccl regarding scf accuracy in early steps in geometry
optimizations. You have a valid point with regard to the energy during
the initial steps, but energy gradients are also needed for the opt. It
is certainly the case in some codes (I'm not sure about g9x) that the
derivatives needed for optimizations must be quite good. The lore of
conjugate gradient optimizations is that they do not work at all without
high accuracy derivative information, and they are fast when such
information is available. I suspect that the berny method used in g9x
also does not work without good derivatives, and that this is what drives
the rather tight scf tolerance in the code.
Steve Williams,
Chemistry, ASU, Boone, NC
******************************************************************
Also in response to a direct inquiry to Gaussian,Inc.
Dr. Little,
We have done a lot of experimentation with optimization and have not
found schemes such as you describe to be a generally satisfactory appoach.
In the case where there are no floppy modes then having an approximate
gradient is unlikely to cause long term difficulties. If there are any
floppy modes then you end up with no significant figures in the gradient
and updated Hessian when you compute the gradient from an incompletely
converged SCF. Further the code for computing the gradient assumes that
certain terms are zero because the energy is fully converged so no all the
terms required are computed. Finally, as you move to larger molecules the
incidence of floppy modes increases so that while I recognize your motive
I think that the physics is working against yo.
A more effective strategy is to consider using a smaller basis set or
possibly a semi-emprical method to get close and estimate the Hessian. Then
read in the results of that as a starting point.
I am happy to discuss this further but we have made it difficult to do what
you propose partly on purpose.
Doug Fox
help #*at*# gausian.com
****************************************************************
At this point I sent a message giving the results of the data
that prompted this inquiry:
Dear Dr. Fox,
Your response to my inquiry and responses I have received from
the Computational Chemistry List on the internet seem to suggest
that there is no value in using lower SCF energy convergence criteria
in the initial stages of an ab initio geometry optimization. But I'm
a little stubborn (;-| and feel that overgeneralizations are sometimes
suspect (:-), so I ran some test cases for my particular molecular
system of study.
Please examine these three examples:
G94 optimization, Single Point SCF convergence
energy oscillations and convergence difficulties are obvious.
Gaussian 94: CrayXMP-Unicos-G94RevB.3 30-May-1995
#P HF/6-31G* OPT TEST SCF=(DIRECT,SP) POP=REG NOSYM FChk=All
SCF Done: E(RHF) = -914.319354505 A.U. after 7 cycles
SCF Done: E(RHF) = -914.333730429 A.U. after 5 cycles
SCF Done: E(RHF) = -914.331939947 A.U. after 5 cycles
SCF Done: E(RHF) = -914.335253115 A.U. after 5 cycles
SCF Done: E(RHF) = -914.335454068 A.U. after 3 cycles
SCF Done: E(RHF) = -914.335497353 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335411368 A.U. after 4 cycles
SCF Done: E(RHF) = -914.335530426 A.U. after 3 cycles
SCF Done: E(RHF) = -914.335434218 A.U. after 4 cycles
SCF Done: E(RHF) = -914.335538665 A.U. after 4 cycles
SCF Done: E(RHF) = -914.335535665 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335546677 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335548457 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335558314 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335548548 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335561113 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335530123 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335570192 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335545324 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335556255 A.U. after 2 cycles
SCF Done: E(RHF) = -914.335530196 A.U. after 2 cycles
Job cpu time: 0 days 5 hours 20 minutes 16.0 seconds.
_______________________________________________________________
G94 optimization - default SCF convergence
convergence is good, but look at the number of SCF cycles
per step in the first four steps.
Gaussian 94: CrayXMP-Unicos-G94RevB.3 30-May-1995
#P HF/6-31G* OPT TEST SCF=DIRECT POP=REG NOSYM FChk=All
SCF Done: E(RHF) = -914.319405628 A.U. after 17 cycles
SCF Done: E(RHF) = -914.333749831 A.U. after 13 cycles
SCF Done: E(RHF) = -914.332017921 A.U. after 14 cycles
SCF Done: E(RHF) = -914.335299442 A.U. after 14 cycles
SCF Done: E(RHF) = -914.335490281 A.U. after 12 cycles
SCF Done: E(RHF) = -914.335532676 A.U. after 12 cycles
SCF Done: E(RHF) = -914.335547663 A.U. after 11 cycles
SCF Done: E(RHF) = -914.335559667 A.U. after 11 cycles
SCF Done: E(RHF) = -914.335565401 A.U. after 9 cycles
SCF Done: E(RHF) = -914.335567964 A.U. after 9 cycles
SCF Done: E(RHF) = -914.335569158 A.U. after 9 cycles
SCF Done: E(RHF) = -914.335569615 A.U. after 8 cycles
Job cpu time: 0 days 6 hours 44 minutes 48.2 seconds.
_______________________________________________________________
Pseudo compound G94 optimization.
The only way I was able to do this example was as two consecutive jobs,
not as a compound calculation.
Ran G94 opt. for 5 steps at Single Point SCF convergence
Gaussian 94: CrayXMP-Unicos-G94RevB.3 30-May-1995
%chk=tdebcp
#P HF/6-31G* OPT OPTCYC=5 TEST SCF=(DIRECT,SP) POP=REG NOSYM FChk=All
SCF Done: E(RHF) = -914.319354505 A.U. after 7 cycles
SCF Done: E(RHF) = -914.333730429 A.U. after 5 cycles
SCF Done: E(RHF) = -914.331939947 A.U. after 5 cycles
SCF Done: E(RHF) = -914.335253115 A.U. after 5 cycles
SCF Done: E(RHF) = -914.335454068 A.U. after 3 cycles
Job cpu time: 0 days 1 hours 27 minutes 57.4 seconds.
... followed by a restart using normal convergence criteria.
Gaussian 94: CrayXMP-Unicos-G94RevB.3 30-May-1995
%chk=tdebcp
#P HF/6-31G* OPT=(restart,readfc) OPTCYC=25 TEST SCF=DIRECT
Guess=check POP=REG NOSYM FChk=All
SCF Done: E(RHF) = -914.335513320 A.U. after 11 cycles
SCF Done: E(RHF) = -914.335538089 A.U. after 12 cycles
SCF Done: E(RHF) = -914.335551099 A.U. after 8 cycles
SCF Done: E(RHF) = -914.335566407 A.U. after 11 cycles
SCF Done: E(RHF) = -914.335568180 A.U. after 8 cycles
SCF Done: E(RHF) = -914.335569384 A.U. after 8 cycles
SCF Done: E(RHF) = -914.335569639 A.U. after 8 cycles
Job cpu time: 0 days 3 hours 32 minutes 0.5 seconds.
_______________________________________________________________
summary (Cray C90 CPU time and final energies (a.u.)):
SP convergence: 5 hrs. 20 min. E(RHF) = -914.335530196
default convergence: 6 hrs. 45 min. E(RHF) = -914.335569615
compound convergence: 5 hrs. E(RHF) = -914.335569639
At least in this one example a few steps at lower SCF convergence
criteria followed by refinement at a higher convergence criteria
not only takes significantly (35%) less time but gives just
as good of a final energy. All three examples used the same (default)
geometry convergence criteria.
All starting geometries have been optimized with AM1 (AMPAC) and a full
conformation search was done for the two hydroxyl groups. The geometry
investigated was prescreened from multiple AM1 optimized local minima
using hf/6-31g* single point calculations. So this example did not use
an extremely bad starting geometry.
It would be convenient to run this class of molecules using a compound
G94 optimization input, so if you have any suggestions I would
be very appreciative.
Stephen Little sbl #*at*# linus.herl.epa.gov
p.s. the molecular structure being optimized is the syn-equatorial
diol epoxide of benzo(c)phenanthrene (see. Chem. Res. Toxicology (1995)
8: 499-505 if you're interested). I can send the exact input geometry
if you wish.
*********************************************************************
Dr. Little,
Since I don't have Chem. Res. Toxicology handy is there a chance
you could FAX me a copy of the structure (412)279-2118.
While the final energies are the same and the convergence criteria
were met I would be interested in knowing what the largest differences
in the structure were. If your structure is fairly rigid this may not
be important but floppy structures, dyes with bi-phenyl linkages
can vary a good deal in structure without changing the energy but with
radically different predicted properties. Also we have found that on
a lot of these cases if the optimization goes off in a bad direction
due to errors in the early steps you can waste a lot of cycles trying
to search back to the real minimum.
We do have an option in G94, OPT=LOOSE, which does not reduce the
SCF criteria but lets you get an approximate structure at a lower level
of theory with reduced convergence criteria on the optimization. Thus
a sequence like,
# AM1 OPT
# HF/3-21G* OPT=LOOSE Geom=Check
# MP2/6-31G* OPT=ReadFC Geom=Check
can be computationally very efficient. You can also replace MP2 with
one of the DFT methods or a higher order correlation.
Doug Fox
help #*at*# gaussian.com
*******************************************************************
I ran some additional tests to get an understanding of the effect of torsional
freedom on the convergence for various optimization criteria. Using
1,1'-dimethyl,1,1-dibenzethane as a test case indicated that for systems with
large displacements per change in energy the determination of the gradient is
more important. But tests with more constrained systems indicated that
successive optimizations with increasing accuracy of basis sets and SCF
convergence criteria give better results in terms of overall time of
calculation. These statements are based on limited observations of a small
number of test cases. For example, I ran some additional calculations in
Spartan using methylcyclopropyl ether and the various optimization criteria
given in Warren Hehre's book "Practical Strategies for Electronic Structure
Calculations" page 18. The results:
Number of Cycles and CPU time (SGI Power Challenge M) for Full Geometry
opitimization at the hf/3-21g* and hf/6-31g* level for various starting
geometries and Hessians.
SYBYL geom PM3 geom hf/3-21g* geom
and Hessian and Hessian and Hessian
hf/3-21g* opt. 14 cyc.(11 min.) 10 c. (8 m.) N/A
hf/6-31g* opt. 15 cyc.(80 min.) 10 c. (51 m.) 7 c. (36 m.)
In the tests using 1,1'-dimethyl,1,1-dibenzethane, a Gaussian 94 hf/6-31g* opt.
using hf/3-21g geometry and force constants did not significantly improve the
rate of convergence on the energy relative to a hf/6-31g* opt. using AM1
geometry and default gradient(both methods reached the same energy in about
number of optimization cycles - 10 vs. 11 cycles) but using the hf/3-21g force
constants improved the gradient convergence stability so that the optimization
was able to terminate at that energy, whereas the hf/6-31g* opt. continued for
17 additional optimization cycles with only a 0.01 kcal/mol improvement in
energy.
But in a further test with the 1,2-epoxide, 3,4-dihydrodiol of
benz(c)phenanthrene a Gaussian 94 hf/6-31g* optimization using hf/3-21g
geometry, the hf/6-31g* opt. converged more quickly than the hf/6-31g* opt. with
AM1 geom and default gradient (5 Cray C90 CPU hours vs. 6.75 Cray 90 CPU hours.
) but still did not converge as quickly as a hf/6-31g* opt. using the force
constants taken from a hf/6-31g* opt. run for 5 opt. cycles with lowered SCF
convergence (10^(-4) a.u.for energy and density)
I have finally decided on the following optimization sequence using Gaussian
input as an example:
COMMENT: Optimization at hf/3-21g(* for atoms above Ne), beginning with mndo
force constants and lowered SCF convergence criteria
%chk=water
#P HF/3-21G(*) Opt=(mndofc,loose) SCF=SP TEST NOSYMM
water
0 1
O
H 1 0.98
H 1 0.98 2 100.0
COMMENT: a few more optimization steps at hf/6-31g*, beginning with hf/3-21g(*)
force constants and geometry and lowered SCF convergence criteria
--Link1--
%chk=water
#P HF/6-31G* OPT=ReadFC SCF=SP Geom=Check OPTCYC=5 TEST NOSYMM
G94 HF/6-31G*
water
0 1
COMMENT: a final optimization at hf/6-31g*, beginning with hf/6 21g* force
constants and geometry and default SCF convergence criteria
--Link1--
%chk=water
#P HF/6-31G* Opt=ReadFC Geom=Check OPTCYC=99 TEST
POP=(REG,CHELPG) NOSYMM FChk=All
G94 HF/6-31G*
water
0 1
I haven't fully tested this input (I'm not sure if it needs to be OPTCYC=5 or 10
in the second optimization), but I need to wrap this up and get on with other
things.
Stephen Little
p.s. I did some additional testing, but that above input sometimes
fails with larger geometries (with larger parameter sets, larger number
of optimization step, etc.).
*********************************************************************
* *
* Stephen Little sbl #*at*# linus.herl.epa.gov *
* Research Chemist *
* U.S. Environmental Protection Agency *
* HERL/ECD/CMB MD-68 *
* Research Triangle Park, NC 27711 919-541-0963 *
* *
*Disclaimer: Mention of trade names or products does not constitute *
*endorsement by the United States Environmental Protection Agency. *
*********************************************************************