From jkl &$at$& ccl.net Mon Mar  4 18:26:47 1991
Date: Mon, 04 Mar 91 17:54:05 EST
From: jkl()at()ccl.net
Subject: Basis sets intro. Part 2/3
To: chemistry.,at,.ccl.net
Status: RO


SEGMENTED CONTRACTIONS. TERMS AND NOTATION.
===========================================
The segmented basis sets are usually structured in such a way that the most
diffuse primitives (primitives with the smallest exponent) are left 
uncontracted (i.e. one primitive per basis function). More compact primitives
(i.e. those with larger exponents) are taken with their coefficients from
atomic Hartree-Fock calculations and one or more contractions are formed.
Then the contractions are renormalized. Sometimes different contractions share
one or two functions (the most diffuse function(s) from the first contraction
enter the next one). 

Cartesian gaussians are grouped in shells coresponding to the same value of 
angular momentum quantum number. Of course, these shells should not be
confused with electron shells (i.e. electrons with the same principal quantum
number: K -> n=1, L -> n=2, etc.). Quantum chemists must have run out of words
on this one. And hence, we have s-shell, p-shell, d-shell, f-shell, g-shell,
etc. The shell is a collection of cartesian gaussians that have the same L (see
definition of cartesian gaussian above). Strictly speaking, the s-shell is
a collection of s type gaussians; p-shell is a collection of p-type gaussians;
d-shell is a collection of d-type gaussians; and so on. Of course, combining
primitives belonging to different shells within the same contraction does not
make sense because primitives from different shells are orthogonal.

But even here there is a room for more confusion. Many basis sets use the same 
exponents for functions corresponding to the same principal quantum number,
i.e., electronic shell. STO-3G is an example, as well as other basis sets from
Pople's group. Atoms of the first and second row (i.e. Li - Ne, Na - Cl) have 
the same exponents for s- and p-type gaussians formally associated with
a given electron shell of the isolated atom. For the basis sets in which
s- and p-type functions share the same exponents, the term SP-shell is used.
Sometimes term L-shell is used by analogy to the 2nd electron shell. This
approximation works very well in practice. Moreover, it is possible to write
efficient code for calculating integrals for such cases. It is important to
stress here that the distinction between inner orbitals and valence orbitals
is kind of arbitrary and lingers from the past era of Slater orbitals. Contrac-
tions consisting of primitives with large exponents are associated with inner
atomic orbitals while more diffuse fuctions are allied with valence orbitals.
Basis functions are not usually atomic orbitals, and in many cases, they do
not even resemble orbitals of isolated atoms. In fact, examining coefficients
of molecular orbitals frequently reveals that these "core" basis functions
contribute substantially to the Highest Occupied Molecular Orbital (HOMO).

The early gaussian contractions were obtained by a least square fit to Slater
atomic orbitals. The number of contractions (not primitives!) used for
representing a single Slater atomic orbital (i.e. zeta) was a measure of the
goodness of the set. From this era we have terms like single zeta (SZ), double
zeta (DZ), triple zeta (TZ), quadruple zeta (QZ), etc. In the minimal basis set
(i.e. SZ) only one basis function (contraction) per Slater atomic orbital is
used. DZ sets have two basis functions per orbital, etc. Since valence
orbitals of atoms are more affected by forming a bond than the inner (core)
orbitals, more basis functions are assigned frequently to describe valence
orbitals. This prompted development of split-valence (SV) basis sets, i.e.,
basis sets in which more contractions are used to describe valence orbitals
than core orbitals. That more basis functions are assigned to valence orbitals
does not mean the valence orbitals incorporate more primitives. Frequently, the
core orbitals are long contractions consisting of many primitive gaussians to
represent well the "cusp" of s type function at the position of the nucleus.
The "zeta" terminology is often augmented with a number of polarization 
functions which will be described later. So, DZP means double-zeta plus 
polarization, TZP stands for triple-zeta plus polarization, etc. Sometimes the
number of polarization functions is given, e.g. TZDP, TZ2P, TZ+2P stands for
triple-zeta plus double polarization. Letter V denotes split valence basis
sets, e.g., DZV represents basis set with only one contraction for inner
orbitals, and two contractions for valence orbitals.  The creativity here is
enormous and spontaneous. 

The minimal basis set is the smallest possible set, i.e., it contains only one 
function per occupied atomic orbital in the ground state. Actually, it always
includes all orbitals from partially occupied subshells and valence p-type
functions for elements from the first 2 groups of the periodic table. So for Li
and Be atoms it has 2 s-type contractions and 1 p-type contraction. Minimal
basis set for S atom has 3 s-type contractions and 2 p-type contractions. The
most popular minimal basis sets are the STO-nG, where n denotes number of
primitives in the contraction. These sets were obtained by least square fit
of the combination of n gaussian functions to a Slater type orbital of the
same type with zeta = 1.0, For this set additional constraint is used, that
exponents of corresponding gaussian primitives are the same for basis functions
describing orbitals with the same principal quantum number (e.g. the same
primitives are used for 2s and 2p function). Then, these exponents are
multipled by the square of zeta in the Slater orbital which described best the
set of molecules. For details, see Szabo and Ostlund (1989) or original
literature quoted on page 71 of Hehre et al. (1986). The STO-3G (i.e. 3
primitives per each function) is the most widely used set. 

For other sets a more complicated notation needs to be used to specify the 
number of primitives and contractions explicitly. The parentheses () embrace
the number of primitives that are given in the order of angular momentum
quantum number. Square brackets [] are used to specify the number of
resulting contractions. For example: (12s,9p,1d) means 12 primitives on
s-shell, 9 primitives on p-shell, and 1 primitive on d-shell. This is sometimes
abbreviated even further by skipping the shell symbols (12,9,1). The [5,4,1]
means that s-shell has 5 contractions, p-shell has 4 contractions and d-shell
has 1 contraction. To denote how contractions were performed, the following
notation is frequently used: (12,9,1) -> [5,4,1]  or (12,9,1)/[5,4,1] or
(12s,9p,1d) -> [5s,4p,1d]. This means that 12 s-type primitives were contracted
to form 5 s-type contractions, 9 p-primitives were contracted to 4 basis
functions and 1 d-primitive was used as a basis function by itself. Note of
caution here. The statement "9 p-primitives were contracted to 4 basis 
functions" actually means that 12 basis functions were created. Each p-type
basis functions has 3 variants: p_x, p_y, and p_z which differ in their
cartesian part (i.e., angular part). The same is true for d-, f-, and higher
angular momentum functions. 

The notation above does not say how many primitives are used in each
contraction. The more elaborate notation explicitly lists the number of
primitives in each contraction. For example: (63111,4311,1) means that there
are 5 s-type contractions consisting of 6, 3, 1, 1 and 1 primitives, 
repectively. The p-shell consists of 4 basis functions with 4, 3, 1 and 1
primitives, and d-shell has 1 uncontracted primitive. Sometimes slashes are
used instead of commas: (63111/4311/1). This is sometimes "abbreviated" to
(633x1,432x1,1). There is also another notation to denote contractions
as L(i/j/k/l...) for each shell corresponding to angular momentum quantum
number equal to L. For example, the (63111,4311,1) basis set is represented as:
s(6/3/1/1/1), p(4/3/1/1), and d(1). Of course, variants of this notation are
also used. You can find this set written as: (6s,3s,1s,1s,1s/4p,3p,1p,1p/1d)
or (6,6,1,1,1/4,3,1,1/1) or [6s,3s,1s,1s,1s/4p,3p,1p,1p/1d] (sic!). I did
not study the combinatorics of this, but quantum chemists might have exhausted
all combinations of digits, brackets and commas. However, if you ask 10 quantum
chemists which notation is considered standard, you will get 20 different 
answers.

Sometimes the same primitive is incorporated in two contractions (i.e. is
"doubled"); e.g., the popular Chandler-McLean (12,9) sulphur basis set (McLean
and Chandler, 1980) is contracted as [6,5] with the scheme (631111,42111).
If you count primitives contained in contractions for the s-shell, you get
13 primitives instead of 12. This means that one primitive is shared (i.e.
doubled) between two contractions, 6- and 3-contraction in this case (it would
make little sense to share a primitive between 6- and 1- or 3- and 1-contrac-
tion since the contractions would be obviously linearly dependent. In some
cases the smallest exponent from the first contraction is repeated in the next
contraction as the largest one. In the above case, the basis set formaly 
represents a general contraction, but since only one function is doubled, it is
used frequently in programs that do not support general contractions.

By convention, the primitives are listed as exponents and coefficients
starting from the highest exponent. In tables of exponents and coefficients
the numbers are frequently represented in an interesting way, with powers
of 10 in parentheses, e.g. 457.3695 is denoted as 4.573696(+2) and
0.01403732 as 1.403732(-2). Of course, it is obvious if you know it.
The typical basis set specification (Gordon, 1980, modified) is given below as
an example:

66-31G basis set for silicon
--------------------------------------------------------
       Exponent       s coefficient     p coefficient
--------------------------------------------------------
1 S   1.61921(+4)      1.94924(-3)
      2.43609(+3)      1.48559(-2)
      5.56001(+2)      7.25689(-2)
      1.56813(+2)      2.45655(-1)
      5.01692(+1)      4.86060(-1)
      1.70300(+1)      3.25720(-1)

2 SP  2.93350(+2)     -2.82991(-3)       4.43334(-3)
      7.01173(+1)     -3.60737(-2)       3.24402(-2)
      2.24301(+1)     -1.16808(-1)       1.33719(-1)
      8.19425          9.35768(-2)       3.26780(-1)
      3.14768          6.01705(-1)       4.51139(-1)
      1.21515          4.22207(-1)       2.64105(-1)

3 SP  1.65370         -2.40600(-1)      -1.51774(-2)
      5.40760(-1)      7.37953(-2)       2.75139(-2)
      2.04406(-1)      1.04094           7.83008(-1)
      
3 SP  7.23837(-2)      1.00000           1.00000
--------------------------------------------------------

In the example above, corresponding exponents for s- and p-type contractions
are equal but coefficients in s- and p-type contractions are different.
Gaussian primitives are normalized here since coefficients for basis functions
consisting of one primitive (last row) are exactly 1.0. The basis set above
represents the following contraction (16s,10p) -> [4s,3p] or (6631,631).

To add to the confusion, the coefficients are sometimes listed either
as original coefficients in atomic orbitals or are renormalized for the given
contraction. In some cases coefficients are premultiplied by a normalization
constant for a gaussian primitive, but in most cases it is assumed that
g(alpha,l,m,n;x,y,z) is already normalized (and this is the correct way!).
You have to be prepared for surprises when entering explicit basis sets from
the literature. Program manuals neglect basis sets description assuming it is
common knowledge. When specifying structure of the basis sets for the entire
molecule, slashes are used to separate information for different atoms (or
rows, if basis sets for a given row have the same structure for all atoms).
The information is given starting from the heaviest atoms. For example, the
basis set for water would be given as (10s,5p,1d/5s,1p) -> [4s,2p,1d/2s,1p] in
which  case the contractions for oxygen atoms are (10,5p,1d) -> [4s,2p,1d] and
for the hydrogen (5s,1p) -> [2s,1p].

Pople's basis sets
------------------
A different convention was adopted by Pople and coworkers. The basis set
structure is given for the whole molecule, rather than particular a atom. This
notation emphasizes also a split valence (SV) nature of these sets. Symbols
like n-ijG or n-ijkG can be encoded as: n - number of primitives for the inner
shells; ij or ijk - number of primitives for contractions in the valence shell.
The ij notations describes sets of valence double zeta quality and ijk sets
of valence triple zeta quality. Generally, in basis sets derived by Pople's
group, the s and p contractions belonging to the same "electron shell" (i.e.
corresponding formally to the same principal quantum number n) are folded
into a sp-shell. In this case, number of s-type and p-type primitives is the
same, and they have identical exponents. However, the coefficients for s- and
p-type contractions are different. 

Now, some examples. The 4-31G basis set for hydrogen (hydrogen has only
valence electrons!) is a contraction (31) or (4s) -> [2s]; for first row atoms
(8s,4p) -> [3s,2p]  or (431,31); and for 2nd row atoms the contraction scheme
is (12s,8p) -> [4s,3p] or (4431,431). For water molecule, these contractions
could be encoded as (431,31/31). The 6-311G set represents the following
contractions for water (6311,311)/(311) or (11s,5p/5s) -> [4s,3p/3s]. 

The Pople's basis sets can also be augmented with d type polarization functions
on heavy atoms only (n-ijG* or n-ijkG*) or on all atoms, with p-functions on
hydrogens (n-ijG** or n-ijkG**). In methane, the 4-31G* encodes following split
(431,31,1)/(31) or (8s,4p,1d/4s)->[3s,2p,1d/2s], while 6-311G** for HCN 
molecule would involve following contractions: (6311,311,1)/(311,1) or
(11s,5p,1d/5s,1p) -> [4s,3p,1d/3s,1p]. Currently, the 6-311G keyword for second
row atoms, as implemented in Gaussian90 program, does not actually correspond
to the true 6-311G set. It is explicitly mentioned in Gaussian90 manual. For
these atoms, 6-311G keyword defaults to MC basis sets (McLean and Chandler,
1980) of the type (12s,9p) -> [6,5] with contraction scheme (631111,42111).
Note, that one of the s-type functions is doubled. The basis sets for P, S and
Cl correspond actually to the "anion" basis sets in the original paper since
"these were deemed to give better results for neutral molecules as well."

Sometimes, for atoms of the second row nm-ijG notation is used. For example,
66-31G means that there is:
 -  1 function containing 6 primitives on the innermost s-shell,
 -  1 set of functions belonging to the inner SP-shell (i.e. 2SP shell),
    each consisting of 6 gaussian primitives (i.e. 1 s-type function and p_x,
    p_y, p_z functions consisting of 6 primitives with the same exponents).
    Note though that coefficients in s and p type contractions are different,
 -  2 sets of SP functions for valence SP shell (one set consisting of
    contractions with 3 primitives and the other with 1 primitive).
It is possible to write this as (16s,10p) -> [4s,3p] or in more details as
(6631,631) contraction scheme or alternatively as s(6/6/3/1), p(6/3/1).

---