From owner-chemistry@ccl.net Thu Dec 13 15:12:00 2007 From: "Frank Jensen frj : chem.au.dk" To: CCL Subject: CCL: origin of ab initio/basis set empiricism Message-Id: <-35847-071213135127-30302-JNfsBtS0GKXNRP8Fscb4Qg[a]server.ccl.net> X-Original-From: Frank Jensen Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Date: Thu, 13 Dec 2007 19:00:08 +0100 MIME-Version: 1.0 Sent to CCL by: Frank Jensen [frj###chem.au.dk] At the risk of being accused for self-promovation, let me add a few =20 comments on the issue of basis sets. A dispropotional number of calculations are done using variations of =20 the 6-31G* or 6-311G* basis sets. The justification is variations of =20 the statement that 'this basis set has been shown to give good =20 results'. Backtracing this leads to some calibration against a set of =20 experimental data for a given set of test systems. Chosing basis sets =20 in this fashion is thus 'empirical' in the sence that it is based on =20 comparison with experiments. A non-empirical way of selecting a basis =20 set must rely on a systematic sequence that smoothly converges towards =20 the basis set limit. Most 'popular' basis sets are of the segmented type, where both =20 exponents and contraction coefficients are optimized based on energy. =20 Unfortunately this leads to the 'multiple minimum' problem. We have =20 for example shown that there are at least 19 different ways of =20 constructing a '6-311' contraction of 11 s-functions. If one also =20 consider possibilities like 7-211, 5-411 and 5-321, the number grows =20 further. Most of these have very similar performances, and there is no =20 unique way of selecting the 'best'. Generating a sequence of segmented =20 basis sets that systematic approach the limit is in my oppinion not =20 possible. A general contracted basis set, on the other hand, separates the =20 optimization of the exponents and the contraction coefficients. As the =20 uncontracted set of functions can be made to converge towards the =20 basis set limit, and the contraction error can be rigorous controlled, =20 this allows construction of a systematic sequence of basis sets, as =20 was first explored by the ANO type basis set. Dunning showed how this could be used to construct the cc-pVXZ =20 sequence for electron correlation methods, and we have used the same =20 idea for constructing the pc-n basis sets for DFT methods. The natural =20 quality parameter in these is the highest angular momentum included. =20 Once this has been selected, the rest of the basis set is in principle =20 unique. These basis sets therefore have a single non-empirical =20 parameter that controls the accuracy. Plane waves have the same single =20 non-empirical parameter quality, but as they usually employe a =20 core-potential, this prevents a rigorous convergence towards the =20 all-electron limit. A notion on the side: while even-tempered basis sets provide a =20 systematic way improving the quality of an atomic calculation, this is =20 not the case for molecular systems. Here increasing the number of =20 functions, even for strictly variational methods like HF and including =20 polarization functions, can increase the energy, and in practise leads =20 to oscillatory behavior. A practical issue is how fast the basis set convergence is. While both =20 the cc-pVXZ and pc-n basis sets provide a controlled convergence, the =20 rate of convergence can for some properties be improved by adding =20 diffuse or tight functions. Both of these options are available for =20 both families of basis sets. My (clearly biased) view is that calculations should always be done =20 using at least a DZP and a TZP quality basis set to identify the =20 pathological cases that always pops up. To illustrate this point: The =20 Ahlrich SVP and the Pople 6-31G* are both of double zeta quality and =20 have typical errors for calculating nuclear magnetic shielding of ~30 =20 ppm. The B3LYP calculated value for oxygen in MgO is +23600 ppm with =20 the SVP and -2960 ppm for 6-31G*. The basis set limiting value is =20 -2440 ppm. Relying on calculations with a single 'empirical' chosen =20 basis set will invariably run into such problems. Using basis sets =20 that belong to families where the error can be controlled allows one =20 to identify such cases, and rigorously remove the basis set error, =20 albeit at a computational cost. Just my 0.02$ (well maybe 0.03$) Frank Citat af "Rene Fournier renef++yorku.ca" : > > Sent to CCL by: Rene Fournier [renef(a)yorku.ca] > Hello, > I agree on > >> All commonly used basis >> sets, even the Pople-style sets, are generated by optimizing >> exponents and contraction coefficients to minimize (ab initio) >> energies of atoms and sometimes molecules. > > That's correct. But there are other problems with choices of basis > sets and that's where empiricism can creep in. How many contracted > functions is enough? What's the best contraction pattern? > (why 6-31G and not 6-21G? or 4-31G? or 9-61G? or 7-1111G?) > How many polarization functions? Why not throw in a few =20 > bond-centered functions? > Many basis sets went into oblivion not because they gave higher energies > for atoms or molecules --- adding more functions and keeping them =20 > uncontracted > always lowers energy. It's that they did not give good agreement =20 > with experimental > bond lengths, bond energies etc. relative to other basis sets of comparabl= e > computational cost. I'm not sure how to call methods where high-level the= ory > is used instead of experiment as the reference to assess goodness of =20 > a lower level > of theory (maybe a smaller basis). I think it's still empirical =20 > because there's > the assumption that if the low-level theory reproduces high-level theory t= o > within some accuracy for cases X, Y, Z, it will also reproduce high-level > level theory to that accuracy for other cases when we do applications. > ( There's also the obvious point that if the high-level theory reference i= s > really, really good, then it IS the experimental result! ) > It is possible to take empiricism out of basis set choice by doing > many calculations in a sequence with increasingly big basis sets defined > by only 1 and maybe 2 parameters, so that one can crank up accuracy smooth= ly, > and extrapolate results to "infinite basis set". Basis sets suitable for > that are: > - even-tempered fully uncontracted basis sets (K Ruedenberg); > - plane-waves (increase energy cut-off and box size); > - all-numerical programs (mostly for diatomics, Becke's NUMOL for =20 > polyatomics). > But I don't see that done very often, even with plane-waves where it would= be > easy I suppose. > > Regards, > Rene > > Rene Fournier Office: 303 Petrie > Chemistry Dpt, York University Phone: (416) 736 2100 Ext. 30687 > 4700 Keele Street, Toronto FAX: (416)-736-5936 > Ontario, CANADA M3J 1P3 e-mail: renef---yorku.ca > > > > > On Wed, 12 Dec 2007, Kirk Peterson kipeters-*-wsu.edu wrote: > >> >> Sent to CCL by: Kirk Peterson [kipeters-$-wsu.edu] >> >> As a sidebar to this discussion, I have to strongly disagree that >> basis set parameters, exponents or contraction coefficients, >> use empirical data in their construction. All commonly used basis >> sets, even the Pople-style sets, are generated by optimizing >> exponents and contraction coefficients to minimize (ab initio) >> energies of atoms and sometimes molecules. Some of the Pople-style >> basis sets utilize scale factors to apply to atom-optimized exponents, >> but these were based on (ab initio) molecular calculations >> and not experimental data. >> >> regards, >> >> -Kirk >> >> On Dec 12, 2007, at 8:21 AM, Rene Fournier renef+*+yorku.ca wrote: >> >> > >> > Sent to CCL by: Rene Fournier [renef\a/yorku.ca] >> > David Craig and Robert Parr first used "ab initio" in quantum >> > chemistry, >> > see >> > http://www.quantum-chemistry-history.com/Parr1.htm >> > Near the middle of that page, Parr recounts: >> > >> > " Craig and I published this paper on "configuration interaction in >> > benzene", where we took the pi-system and did essentially a complete >> > configuration interaction calculation on it. >> > >> > That has some trivial historical interest in that it was there that >> > the >> > word, the term ab initio was introduced. Craig and Ross had computed >> > everything from the start in London and I had personally computed >> > everything from start in Pittsburgh. Then we compared our answers >> > when we >> > were finished- This involved computing of all the integrals as best as >> > they could be done and selecting the configurations to mix for the >> > ground >> > and exited states because there were electronic states that were of >> > experimental interest and we checked our answers one against each >> > other >> > when we were finished. And what the paper says is, that these >> > calculations >> > were done ab initio by Craig and Ross and by me, independently. And >> > Mulliken later said that this was the introduction of the term ab >> > initio >> > into quantum chemistry. In the short review that you have, I talk >> > about >> > this and reproduce a picture of a letter from Craig to me where he >> > uses >> > the term ab initio in a different context. So ab initio was >> > introduced in >> > the quantum chemistry by Craig in a letter to me and I put it into the >> > manuscript. That's where ab initio came from. " >> > >> > >> > Funny thing is Parr later became a champion of Density Functional >> > Theory >> > and for many years (70's, 80's) DFT practitioners were often >> > criticized >> > for doing calculations that were not "ab initio". I think views have >> > changed now; "first-principles" was introduced probably to say >> > "mostly >> > not empirical" but without the implications "ab initio" had acquired >> > over >> > the years. The term "ab initio calculation", as it's commonly used, >> > very rarely refers to a calculation "devoid of empiricism", for >> > example >> > the choice of basis set parameters is almost always empirical, >> > see discussion on >> > =20 >> http://www.ccl.net/chemistry/resources/messages/2001/11/28.002-dir/index.= html >> > >> > Rene Fournier Office: 303 Petrie >> > Chemistry Dpt, York University Phone: (416) 736 2100 Ext. 30687 >> > 4700 Keele Street, Toronto FAX: (416)-736-5936 >> > Ontario, CANADA M3J 1P3 e-mail: renef*_*yorku.ca >> > >> > >> > On Wed, 12 Dec 2007, Christoph Etzlstorfer christoph.etzlstorfer . >> > jku.at wrote: >> > >> >> There is a story about that in Michael J.S. Dewars biography "A >> >> semiempirical life", American Chemical Society, 1992, p. 129. >> >> >> >> Best regards >> >> >> >> Christoph >> >> >> >> >> >> Am 11.12.2007 um 03:13 schrieb Tommy Ohyun Kwon ohyun.kwon _ >> >> chemistry.gatech.edu: >> >> >> >>> >> >>> Sent to CCL by: Tommy Ohyun Kwon [ohyun.kwon . chemistry.gatech.edu] >> >>> Dear CCLers; >> >>> I would appreciate it if anyone could tell me who used the term of >> >>> "ab initio >> >>> calculations" first. >> >>> Thank you very much for your kind attention. >> >>> >> >>> Best wishes, >> >>> >> >>> Tommy >> >>> >> >>> >> >>> -- >> >>> Tommy Ohyun Kwon, Ph.D >> >>> School of Chemistry and Biochemistry >> >>> Georgia Institute of Technology >> >>> Atlanta Georgia, 30332 >> >>> Email: ohyun.kwon]*[chemistry.gatech.edu >> >>> >> >>> >> >>> >> >>> -=3D This is automatically added to each message by the mailing >> >>> script =3D- >> >>> To recover the email address of the author of the message, please >> >>> change> Conferences: http://server.ccl.net/chemistry/announcements/ >> >>> conferences/ >> >>> >> >>> Search Messages: http://www.ccl.net/htdig (login: ccl, Password: >> >>> search)> >> >>> >> >> >> >> #################################################### >> >> www.etzlstorfer.com >> >> *********************************************************** >> >> Dr. Christoph Etzlstorfer Phone: *43-732-2468-8750 >> >> Universitaet Linz Fax: *43-732-2468-8747 >> >> A-4040 Linz E-mail: christoph.etzlstorfer,+,jku.at >> >> Austria http://www.orc.uni-linz.ac.at >> >> #################################################### >> >> >> >> >> >> >> >> >> >> >> > >> > >> > >> > -=3D This is automatically added to each message by the mailing script >> > =3D- >> > To recover the email address of the author of the message, please >> > change> Conferences: http://server.ccl.net/chemistry/announcements/ >> > conferences/ >> > >> > Search Messages: http://www.ccl.net/htdig (login: ccl, Password: >> > search)> >> >> > > > > -=3D This is automatically added to each message by the mailing script =3D= -> > > Frank Jensen http://www.chem.au.dk/~frj ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.