ab initio (Gaussian) optimization summary

From: Stephen Little <sbl #*at*# linus.herl.epa.gov>
Subject: ab initio (Gaussian) optimization summary
Date: Thu, 19 Oct 95 07:51:05 -0400
 Dear Netters,
   The following is a summary of the responses to a question
 posted on the CCL several weeks ago.
   I have also placed some comments of my own
 in this summary. If anyone has any additional comments they
 can contact me directly. I haven't really answered all the
 questions to my satisfaction but I must move on this other issues.
 ********************************************************************
 The original question:
 Hello CCL,
   I am trying to reduce the ab initio (e.g. rhf/6-31g*) geometry
 optimization time in Gaussian 94. Gaussian 94 uses redundant
 internal coordinates by default so the set up of the z-matrix
 is not so much of an issue. My question (which I have also
 posted to info #*at*# gaussian.com) is whether adjustable SCF convergence
 criteria exist for geometry optimization in general. My question to the
 list applies to all quantum mechanical geometry optimization.
 It seems to me that a precise SCF convergence is wasteful of time
 in the initial stages of geometry optimization when the geometry
 is relatively crude, since the hamiltonian may have to be
 significantly altered in succeeding geometry optimization steps.
 It would seem that some changing value such as one-tenth the
 energy between steps in the geometry optimization would make
 more efficient use of the SCF cycles.
 If anyone has some more experience in this matter or would like
 to responde to flaws to my logic, please respond to me and I will
 post a summary to the list.
 Thanks for your assistance.
 Stephen Little   sbl #*at*# linus.herl.epa.gov
 ****************************************************************
 The responses:
 ****************************************************************
      Hi Stephen,
 The problem with such a scheme is that the gradients become much less
 accurate. Modern optimization algorithms heavily rely on high accuracy
 of both energy and gradients for the Hessian updating scheme. Using
 poor data in the early stages would mean deterioration or, at best, no
 improvement, of the Hessian. The initial Hessian is being estimated or
 may be calculated (CalcFC); it definitely needs the improvement during
 the optimization process. Omitting the update, or updating with data of
 low accuracy, significantly slows down the optimization process.
 So, it is something like "attempting to avoid Scylla, they got into the
 Charybdis" (I don't remember the Greek original).
 Best wishes,
       Nico van Eikema Hommes
 --
   Dr. N.J.R. van Eikema Hommes     Computer-Chemie-Centrum
   hommes #*at*# ccc.uni-erlangen.de       Universitaet Erlangen-Nuernberg
   Phone:    +49-(0)9131-856532     Naegelsbachstr. 25
   FAX:      +49-(0)9131-856566     D-91052 Erlangen, Germany
 *****************************************************************
 Greetings...
 I have had several discussions with both Frisch and Schlegel on this
 issue.
 I find the following scheme to be of significant robustness that I
 use it all the time:
 For organic molecules I run the following job stream: (the existence
 of a checkpoint file in the proper places in this compound job is
 assumed.) The loose option multiplies all convergence criteria by
 a factor of 5.
 #p rpm3 opt iop(4/20=8)    !note: pm3 force constants at the pm3
  geometry help!
 stuff...
 --LINK1--
 #p hf/3-21g opt=(mndofc,loose) iop(4/20=8) geom=check  !pm3
 force constants!
 stuff...
 --LINK1--
 #p hf/6-31G* opt=(readfc,loose) geom=check guess=read
 stuff...
 --LINK1--
 #p hf/6-31G* opt=readfc geom=check guess=read
 stuff...
 Regards,
 John
 John M. McKelvey                        email: mckelvey #*at*# Kodak.COM
 Computational Science Laboratory        phone: (716) 477-3335
 2nd Floor, Bldg 83, RL
 Eastman Kodak Company                   =
 Rochester, NY 14650-2216
 **************************************************************
 Date:  3 Aug 95  8:01 +0200
 From: Frank Jensen <frj #*at*# dou.dk>
 To: sbl #*at*# linus.herl.epa.gov
 Subject: geom opt
         Stephan,
         In principle you are correct, it is wastefull to
 obtain a tightly converged SCF wave function if the geometry
 is far from a stationary point. So one should in principle
 use an SCF convergence criteria dependent on the size of
 the gradient. There are, however, two factors which makes
 this less useful.
         1) The final SCF cycles are inexpensive. In a
 conventional SCF, the final 2-3 SCF iterations for converging
 from say 10(-6) to 10(-8) takes very little time. Direct
 SCF methods are slightly more expensive, but again compared
 to the constant time required for the first say 10 SCF
 iterations and the gradient contribution, the saving of
 a few SCF cycles is marginal.
         2) If your starting geometry has a large gradient,
 you are not doing the geometry optimization the most
 efficient way. One would normally optimize geometries
 starting from optimized geometries at a lower level,
 i.e. starting ab initio optimizations from geometries
 obtained with semi-empirical or force field methods.
 Optimizations at higher ab initio level should be
 preceeded by optimizations at lower levels, like perhaps
 HF/STO-3G or 3-21G or similar small basis sets.
 The high level methods, where the saving by you suggestion
 would be largest, is normally only done to make the
 final refinements on the geometry, where accurate gradients
 and therefore also accurate wave functions are needed.
         Frank Jensen
 ***************************************************************
 The crucial issue is obtaining a sufficiently precise hessian for
 optimizations.  To my knowledge, all current QM software use an approximate
 hessian for optimizations and a more exact one for computing frequencies.
 At this stage, the degree of precision is probably fairly well tuned and I
 wouldn't expect much of a speedup by varying this.
 In terms of speeding up optimizations, you should consider two key issues.
 One is minimizing the number of steps required for the optimization and the
 other is running your system under optimum conditions.
 To minimize the number of steps, get better starting geometries and use a
 molecular mechanics or semiempirical preminimizer for ground-state
 geometries.  Using starting geometry optimized at lower ab intio level is
 also helpful if you happen to have it but I find little benefit in starting
 from say a 3-21G geometry verses molecular mechanics or semiempirical
 optimization if you're heading for 6-31G* or better.  However, if your
 interested in transition-states this approach may not, and in my experience
 does not, work well and you'll have to report to other methods  which I
 won't go into.
 Choice of optimization or search algorithm can also have a significant
 impact on the number of steps required to achieve optimization.  As I
 understnd it, Gaussian claims that use of redundant internal coordinates
 typically converges on the optimum more quickly and smoothly thereby
 reducing time.  Besides taking fewer steps, redundant internal coordinates
 are also supposed to be better at avoiding or overcoming situations which
 have difficulty converging, i.e., fail to identify a minimum.  Others may
 make similar claims of one algorithm verses another with a select set of
 compounds but there is no generally recognized validation set and one is
 apt to experience situations where one method can be shown to converge in
 fewer steps than another.  Such "discussions" appear not only in QM
 but
 also molecualr mechanics methods and I read them with interest but also
 some scepticism and a great deal of faith that things will only get better.
 Aside from the obvious CPU upgrade or acquiring a newer, faster system, you
 can also speed up calculations by checking your system configuration
 against your software needs for your a typical calculation.  Some simple
 checks and improvements (most important first):
 -  If you have inordinate swapping, get more RAM.
 -  If I/O is consistently >20% of your elapsed job time, improve I/O.  one
    inexpensive method is to stripe acrosss multiple disks on multiple SCSI
    channels.  Depending on your system and budget, you might be able to
    keep I/O =<10% of the total run time (system running single job).
 -  With Gaussian, and possibly some other codes, you may want to adjust the
    amount of RAM allocated for a job since this which affects whether
    different parts are computed by in-core, direct or semi-direct methods
    which in turn impacts CPU and I/O load.  The various methods can also be
    expected to give different parallel response.  Unfortunately, there is
    no systematic study and performance may depend on general system
    configuartion.  Generally, direct SCF will give the best overall
    performance but you may wish to check your system with a "typical"
    computation to identify your preferred options.
 Regards, Karl
  _______________________________________________________________________
 /                                                                       \
 | Comments are those of the author and not Unilever Research U. S.      |
 |                                                                       |
 | Karl F. Moschner, Ph. D.                                              |
 |                                                                       |
 | Unilever Research U. S.      e-mail: Karl.F.Moschner #*at*# urlus.sprint.com |
 | 45 River Road                Phone:  (201) 943-7100 x2629             |
 | Edgewater, NJ 07020          FAX:    (201) 943-5653                   |
 \_______________________________________________________________________/
 *******************************************************************
 I saw your post on ccl regarding scf accuracy in early steps in geometry
 optimizations.  You have a valid point with regard to the energy during
 the initial steps, but energy gradients are also needed for the opt.  It
 is certainly the case in some codes (I'm not sure about g9x) that the
 derivatives needed for optimizations must be quite good.  The lore of
 conjugate gradient optimizations is that they do not work at all without
 high accuracy derivative information, and they are fast when such
 information is available.  I suspect that the berny method used in g9x
 also does not work without good derivatives, and that this is what drives
 the rather tight scf tolerance in the code.
 Steve Williams,
 Chemistry, ASU, Boone, NC
 ******************************************************************
 Also in response to a direct inquiry to Gaussian,Inc.
   Dr. Little,
   We have done a lot of experimentation with optimization and have not
 found schemes such as you describe to be a generally satisfactory appoach.
 In the case where there are no floppy modes then having an approximate
 gradient is unlikely to cause long term difficulties.  If there are any
 floppy modes then you end up with no significant figures in the gradient
 and updated Hessian when you compute the gradient from an incompletely
 converged SCF.  Further the code for computing the gradient assumes that
 certain terms are zero because the energy is fully converged so no all the
 terms required are computed.  Finally, as you move to larger molecules the
 incidence of floppy modes increases so that while I recognize your motive
 I think that the physics is working against yo.
   A more effective strategy is to consider using a smaller basis set or
 possibly a semi-emprical method to get close and estimate the Hessian.  Then
 read in the results of that as a starting point.
   I am happy to discuss this further but we have made it difficult to do what
 you propose partly on purpose.
 Doug Fox
 help #*at*# gausian.com
 ****************************************************************
 At this point I sent a message giving the results of the data
 that prompted this inquiry:
 Dear Dr. Fox,
 Your response to my inquiry and responses I have received from
 the Computational Chemistry List on the internet seem to suggest
 that there is no value in using lower SCF energy convergence criteria
 in the initial stages of an ab initio geometry optimization. But I'm
 a little stubborn (;-| and feel that overgeneralizations are sometimes
 suspect (:-), so I ran some test cases for my particular molecular
 system of study.
 Please examine these three examples:
 G94 optimization, Single Point SCF convergence
 energy oscillations and convergence difficulties are obvious.
 Gaussian 94:  CrayXMP-Unicos-G94RevB.3 30-May-1995
 #P HF/6-31G* OPT TEST SCF=(DIRECT,SP) POP=REG NOSYM FChk=All
  SCF Done:  E(RHF) =  -914.319354505     A.U. after    7 cycles
  SCF Done:  E(RHF) =  -914.333730429     A.U. after    5 cycles
  SCF Done:  E(RHF) =  -914.331939947     A.U. after    5 cycles
  SCF Done:  E(RHF) =  -914.335253115     A.U. after    5 cycles
  SCF Done:  E(RHF) =  -914.335454068     A.U. after    3 cycles
  SCF Done:  E(RHF) =  -914.335497353     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335411368     A.U. after    4 cycles
  SCF Done:  E(RHF) =  -914.335530426     A.U. after    3 cycles
  SCF Done:  E(RHF) =  -914.335434218     A.U. after    4 cycles
  SCF Done:  E(RHF) =  -914.335538665     A.U. after    4 cycles
  SCF Done:  E(RHF) =  -914.335535665     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335546677     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335548457     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335558314     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335548548     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335561113     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335530123     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335570192     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335545324     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335556255     A.U. after    2 cycles
  SCF Done:  E(RHF) =  -914.335530196     A.U. after    2 cycles
 Job cpu time:  0 days  5 hours 20 minutes 16.0 seconds.
 _______________________________________________________________
 G94 optimization - default SCF convergence
 convergence is good, but look at the number of SCF cycles
 per step in the first four steps.
  Gaussian 94:  CrayXMP-Unicos-G94RevB.3 30-May-1995
 #P HF/6-31G* OPT TEST SCF=DIRECT POP=REG NOSYM FChk=All
  SCF Done:  E(RHF) =  -914.319405628     A.U. after   17 cycles
  SCF Done:  E(RHF) =  -914.333749831     A.U. after   13 cycles
  SCF Done:  E(RHF) =  -914.332017921     A.U. after   14 cycles
  SCF Done:  E(RHF) =  -914.335299442     A.U. after   14 cycles
  SCF Done:  E(RHF) =  -914.335490281     A.U. after   12 cycles
  SCF Done:  E(RHF) =  -914.335532676     A.U. after   12 cycles
  SCF Done:  E(RHF) =  -914.335547663     A.U. after   11 cycles
  SCF Done:  E(RHF) =  -914.335559667     A.U. after   11 cycles
  SCF Done:  E(RHF) =  -914.335565401     A.U. after    9 cycles
  SCF Done:  E(RHF) =  -914.335567964     A.U. after    9 cycles
  SCF Done:  E(RHF) =  -914.335569158     A.U. after    9 cycles
  SCF Done:  E(RHF) =  -914.335569615     A.U. after    8 cycles
  Job cpu time:  0 days  6 hours 44 minutes 48.2 seconds.
 _______________________________________________________________
 Pseudo compound G94 optimization.
 The only way I was able to do this example was as two consecutive jobs,
 not as a compound calculation.
 Ran G94 opt. for 5 steps at Single Point SCF convergence
 Gaussian 94:  CrayXMP-Unicos-G94RevB.3 30-May-1995
 %chk=tdebcp
 #P HF/6-31G* OPT OPTCYC=5 TEST SCF=(DIRECT,SP) POP=REG NOSYM FChk=All
  SCF Done:  E(RHF) =  -914.319354505     A.U. after    7 cycles
  SCF Done:  E(RHF) =  -914.333730429     A.U. after    5 cycles
  SCF Done:  E(RHF) =  -914.331939947     A.U. after    5 cycles
  SCF Done:  E(RHF) =  -914.335253115     A.U. after    5 cycles
  SCF Done:  E(RHF) =  -914.335454068     A.U. after    3 cycles
 Job cpu time:  0 days  1 hours 27 minutes 57.4 seconds.
  ... followed by a restart using normal convergence criteria.
 Gaussian 94:  CrayXMP-Unicos-G94RevB.3 30-May-1995
 %chk=tdebcp
  #P HF/6-31G* OPT=(restart,readfc) OPTCYC=25 TEST SCF=DIRECT
     Guess=check POP=REG NOSYM FChk=All
  SCF Done:  E(RHF) =  -914.335513320     A.U. after   11 cycles
  SCF Done:  E(RHF) =  -914.335538089     A.U. after   12 cycles
  SCF Done:  E(RHF) =  -914.335551099     A.U. after    8 cycles
  SCF Done:  E(RHF) =  -914.335566407     A.U. after   11 cycles
  SCF Done:  E(RHF) =  -914.335568180     A.U. after    8 cycles
  SCF Done:  E(RHF) =  -914.335569384     A.U. after    8 cycles
  SCF Done:  E(RHF) =  -914.335569639     A.U. after    8 cycles
 Job cpu time:  0 days  3 hours 32 minutes  0.5 seconds.
 _______________________________________________________________
 summary (Cray C90 CPU time and final energies (a.u.)):
 SP convergence:         5 hrs. 20 min.      E(RHF) =  -914.335530196
 default convergence:    6 hrs. 45 min.      E(RHF) =  -914.335569615
 compound convergence:   5 hrs.              E(RHF) =  -914.335569639
 At least in this one example a few steps at lower SCF convergence
 criteria followed by refinement at a higher convergence criteria
 not only takes significantly (35%) less time but gives just
 as good of a final energy. All three examples used the same (default)
 geometry convergence criteria.
 All starting geometries have been optimized with AM1 (AMPAC) and a full
 conformation search was done for the two hydroxyl groups. The geometry
 investigated was prescreened from multiple AM1 optimized local minima
 using hf/6-31g* single point calculations. So this example did not use
 an extremely bad starting geometry.
 It would be convenient to run this class of molecules using a compound
 G94 optimization input, so if you have any suggestions I would
 be very appreciative.
 Stephen Little   sbl #*at*# linus.herl.epa.gov
 p.s. the molecular structure being optimized is the syn-equatorial
 diol epoxide of benzo(c)phenanthrene (see. Chem. Res. Toxicology (1995)
 8: 499-505 if you're interested). I can send the exact input geometry
 if you wish.
 *********************************************************************
   Dr. Little,
   Since I don't have Chem. Res. Toxicology handy is there a chance
 you could FAX me a copy of the structure (412)279-2118.
   While the final energies are the same and the convergence criteria
 were met I would be interested in knowing what the largest differences
 in the structure were.  If your structure is fairly rigid this may not
 be important but floppy structures, dyes with bi-phenyl linkages
 can vary a good deal in structure without changing the energy but with
 radically different predicted properties.  Also we have found that on
 a lot of these cases if the optimization goes off in a bad direction
 due to errors in the early steps you can waste a lot of cycles trying
 to search back to the real minimum.
   We do have an option in G94, OPT=LOOSE, which  does not reduce the
 SCF criteria but lets you get an approximate structure at a lower level
 of theory with reduced convergence criteria on the optimization.  Thus
 a sequence like,
 # AM1 OPT
 # HF/3-21G* OPT=LOOSE Geom=Check
 # MP2/6-31G* OPT=ReadFC Geom=Check
 can be computationally very efficient.  You can also replace MP2 with
 one of the DFT methods or a higher order correlation.
 Doug Fox
 help #*at*# gaussian.com
 *******************************************************************
 I ran some additional tests to get an understanding of the effect of torsional
 freedom on the convergence for various optimization criteria. Using
 1,1'-dimethyl,1,1-dibenzethane as a test case indicated that for systems with
 large displacements per change in energy the determination of the gradient is
 more important. But tests with more constrained systems indicated that
 successive optimizations with increasing accuracy of basis sets and SCF
 convergence criteria give better results in terms of overall time of
 calculation. These statements are based on limited observations of a small
 number of test cases.  For example, I ran some additional calculations in
 Spartan using methylcyclopropyl ether and the various optimization criteria
 given in Warren Hehre's book "Practical Strategies for Electronic Structure
 Calculations" page 18.  The results:
 Number of Cycles and CPU time (SGI Power Challenge M) for Full Geometry
 opitimization  at the hf/3-21g* and hf/6-31g* level for various starting
 geometries and Hessians.
                  SYBYL geom          PM3 geom      hf/3-21g* geom
                  and Hessian        and Hessian     and Hessian
 hf/3-21g* opt.   14 cyc.(11 min.)    10 c. (8 m.)            N/A
 hf/6-31g* opt.   15 cyc.(80 min.)    10 c. (51 m.)    7 c. (36 m.)
 In the tests using  1,1'-dimethyl,1,1-dibenzethane, a Gaussian 94 hf/6-31g* opt.
 using hf/3-21g geometry and force constants did not significantly improve the
 rate of convergence on the energy relative to a hf/6-31g* opt. using AM1
 geometry and default gradient(both methods reached the same energy in about
 number of optimization cycles - 10 vs. 11 cycles) but using the hf/3-21g force
 constants improved the gradient convergence stability so that the optimization
 was able to terminate at that energy, whereas the hf/6-31g* opt. continued for
 17 additional optimization cycles with only a 0.01 kcal/mol improvement in
 energy.
 But in a further test with the 1,2-epoxide, 3,4-dihydrodiol of
 benz(c)phenanthrene a Gaussian 94 hf/6-31g* optimization using hf/3-21g
 geometry, the hf/6-31g* opt. converged more quickly than the hf/6-31g* opt. with
 AM1 geom and default gradient (5 Cray C90 CPU hours vs. 6.75 Cray 90 CPU hours.
 ) but still did not converge as quickly as a hf/6-31g* opt. using the force
 constants taken from a hf/6-31g* opt. run for 5 opt. cycles with lowered SCF
 convergence (10^(-4) a.u.for energy and density)
 I have finally decided on the following optimization sequence using Gaussian
 input as an example:
 COMMENT: Optimization at hf/3-21g(* for atoms above Ne), beginning with mndo
 force constants and lowered SCF convergence criteria
   %chk=water
   #P HF/3-21G(*) Opt=(mndofc,loose) SCF=SP TEST NOSYMM
  water
  0 1
  O
  H   1  0.98
  H   1  0.98   2   100.0
 COMMENT:  a few more optimization steps at hf/6-31g*, beginning with hf/3-21g(*)
 force constants and geometry and lowered SCF convergence criteria
 --Link1--
   %chk=water
   #P HF/6-31G* OPT=ReadFC SCF=SP Geom=Check OPTCYC=5 TEST  NOSYMM
    G94 HF/6-31G*
       water
   0 1
 COMMENT:  a final optimization at hf/6-31g*, beginning with hf/6 21g* force
 constants and geometry and default SCF convergence criteria
 --Link1--
   %chk=water
   #P HF/6-31G* Opt=ReadFC Geom=Check OPTCYC=99 TEST
    POP=(REG,CHELPG) NOSYMM FChk=All
    G94 HF/6-31G*
       water
   0 1
 I haven't fully tested this input (I'm not sure if it needs to be OPTCYC=5 or 10
 in the second optimization), but I need to wrap this up and get on with other
 things.
 Stephen Little
 p.s. I did some additional testing, but that above input sometimes
 fails with larger geometries (with larger parameter sets, larger number
 of optimization step, etc.).
 *********************************************************************
 *                                                                   *
 *    Stephen Little          sbl #*at*# linus.herl.epa.gov                 *
 *    Research Chemist                                               *
 *    U.S. Environmental Protection Agency                           *
 *    HERL/ECD/CMB              MD-68                                *
 *    Research Triangle Park, NC  27711          919-541-0963        *
 *                                                                   *
 *Disclaimer: Mention of trade names or products does not constitute *
 *endorsement by the United States Environmental Protection Agency.  *
 *********************************************************************