CCL:G: CBS-Q method in druglike molecules

From: <willsd::appstate.edu>
Subject: CCL:G: CBS-Q method in druglike molecules
Date: Fri, 10 Jul 2009 11:44:30 GMT
 Sent to CCL by: [willsd(0)appstate.edu]
 OK, you have a 64 bit OS running on a 64 bit machine.  It is possible to run a
 32 bit version of G09 on such a machine, but since the price is the same for 64
 or 32 bit versions on G09, it would be bizarre to buy the 32 bit version....
 I'll assume you have 64 bit version of G09.
 You can confirm this easily... in your G09 output files you should see something
 with 64 in it just after the citation.  Here is what it looks like for a 64 bit
 version of G03:
  Cite this work as:
  Gaussian 03, Revision E.01,
  M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria,
  M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven,
  K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi,
  V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega,
  G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota,
  R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao,
  H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross,
  V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann,
  O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski,
  P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg,
  V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain,
  O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari,
  J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford,
  J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz,
  I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham,
  C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill,
  B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople,
  Gaussian, Inc., Wallingford CT, 2004.
  ******************************************
  Gaussian 03:  EM64L-G03RevE.01 11-Sep-2007
                 11-Sep-2008
  ******************************************
 The EM64L indicates that this a 64 bit version optimized for intel em64t
 machines (mine has xeon processors; yours has (I think) itanium processors).
 Your questions:
 Replace QCISD(T) with QCISD?  No, this is a poor idea, and difficult to do.  The
 CBS-Q method was optimized for including QCI energies computed with perturbative
 triples corrections.  If you remove the triples correction, the rest of the
 parameters in CBS-Q will no longer be the correct ones to use; the resulting
 energies will not be correct.  It is also difficult to do this in Gxx.  The
 Gaussian programs include CBS-Q (and related methods) as automatic options.
 These methods compute various energies with different methods and basis sets,
 then calculate differences between them (or other relatively simple
 extrapolation techniques); these differences are summed and added as a
 correction to some "fundamental" energy, and perhaps another
 correction is
 applied as well.  When you request a CBS-Q calculation, Gaussian then performs
 ALL of the required method/basis set calculations, computes the energy
 differences, sums them and applies the corrections.  To change how this is done
 (that is to replace the QCISD(T) step with QCISD) you would have to modify the
 source code, recompile, AND then re-calibrate your new method and find new
 parameters, and give it a new name (since it would no longer be CBS-Q).  This
 would be a lot of work and would be rather far from your goal of a simple pKa
 estimation like you found in the literature.
 Different method (CBS-4M or G1)?
 If you want to try something other than CBS-Q, you must realize the you are
 giving up the calibration in the literature that tells you how accurately you
 can expect the CBS-Q to pKa conversion to be.  There is nothing wrong with this,
 it just means that you will need to do more calibration work yourself, by
 computing more known molecules to compare your new pKa estimates with.
 If you are going to do this, I suggest you investigate the multi-coefficient G3
 methods developed by Don Truhlar and his group (MCG3/3 is relatively
 inexpensive, and remarkably accurate).  I do not know if these are available as
 automatic calculations in G09; they were not included in G03.  If they are not,
 then the MULTILEVEL code (free from Truhlar's group) will automate the
 calculations for you, or you can write your own tool to compute appropriate
 differences and sums (I have dome this with excel).  MULTILEVEL does not require
 commercial compilers (I have used gfortran), so it would not require buying any
 more software.
 I have used MCG3/3 on closed shell molecules and radicals.  With structures and
 frequencies from an initial MP2(full)/6-31++G(d) calculation, this method
 reliably predicts adiabatic ionization energies with an error of about 0.1 eV,
 and I can use it for molecules where a CBS type extrapolation would be extremely
 slow to compute (weeks of CPU).  A word of caution here: the multi-coefficient
 methods have more parameters than other high accuracy methods, so fooling around
 with the underlying calculations would cause even more problems than with CBS-Q.
  Truhlar's group has a lot of expertise with calibrating these methods and I am
 perfectly happy to simply use their results.  I have also noticed that for most
 of my molecules, the most time consuming step is the initial
 MP2(full)/6-31++G(d) opt freq step; the time required for this step scales with
 the fifth power of the number of atoms.
 If you want to try MCG3/3 and you do not want to use MULTILEVEL, here is a
 sample input file that will give the required energies for MCG3/3 at the
 MP2(full) level for acetaldehyde; you would need to look at the original MCG3/3
 papers to see how to combine the energies from this calculation ot get the
 MCG3/3 energy.  You would also need the appropriate parts of the mg3s basis
 (also from Truhlar) for the atoms in your molecules.
 %chk=acet_01.chk
 %mem=100MW
 %nprocs=8
 # ump2(full) 6-31++g(d) opt freq=noraman
 acetaldehyde geometry optimization.  first try.
 0  1
  C                 -0.591726   -0.038182   -3.751955
  H                 -1.545230    0.181729   -3.319093
  H                 -0.226392   -0.963771   -3.358617
  H                 -0.691928   -0.116889   -4.814342
  C                  0.399013    1.090748   -3.412025
  H                  0.496310    1.932068   -4.065946
  O                  1.094159    1.023790   -2.365192
 --Link1--
 %chk=acet_01.chk
 %mem=100MW
 %nprocs=8
 # qcisd(t) 6-31g(d) guess=tcheck geom=check
 mcg3/3 calculation; qcisd(t) step
 0 1
 --Link1--
 %chk=acet_01.chk
 %mem=100MW
 %nprocs=8
 # mp4sdq 6-31g(2df,p) guess=tcheck geom=check
 mcg3/3 calculation; mp4 2df step
 0 1
 --Link1--
 %chk=acet_01.chk
 %mem=100MW
 %nprocs=8
 # mp2 Gen guess=tcheck geom=check
 mcg3/3 calculation; mp2/mg3s step; include appropriate portion of
 mg3s basis set below blank line after charge multiplicity line
 0 1
 -H  0
  S  3 1.00
    33.8650000000 0.0254938000
    5.0947900000 0.1903730000
    1.1587900000 0.8521610000
  S  1 1.00
    0.3258400000 1.0000000000
  S  1 1.00
    0.1027410000 1.0000000000
  P  1 1.00
    1.5000000000 1.0000000000
  P  1 1.00
    0.3750000000 1.0000000000
 ****
 -C  0
  S  6 1.00
    4563.240000000 0.0019666500
    682.0240000000 0.0152306000
    154.9730000000 0.0761269000
    44.4553000000 0.2608010000
    13.0290000000 0.6164620000
    1.8277300000 0.2210060000
  S  3 1.00
    20.9642000000 0.1146600000
    4.8033100000 0.9199990000
    1.4593300000 -0.0030306800
  S  1 1.00
    0.4834560000 1.0000000000
  S  1 1.00
    0.1455850000 1.0000000000
  S  1 1.00
    0.0438000000 1.0000000000
  P  3 1.00
    20.9642000000 0.0402487000
    4.8033100000 0.2375940000
    1.4593300000 0.8158540000
  P  1 1.00
    0.4834560000 1.0000000000
  P  1 1.00
    0.1455850000 1.0000000000
  P  1 1.00
    0.0438000000 1.0000000000
  D  1 1.00
    1.2520000000 1.0000000000
  D  1 1.00
    0.3130000000 1.0000000000
  F  1 1.00
    0.8000000000 1.0000000000
 ****
 -O  0
  S  6 1.00
    8588.500000000 0.0018951500
    1297.230000000 0.0143859000
    299.2960000000 0.0707320000
    87.3771000000 0.2400010000
    25.6789000000 0.5947970000
    3.7400400000 0.2808020000
  S  3 1.00
    42.1175000000 0.1138890000
    9.6283700000 0.9208110000
    2.8533200000 -0.0032744700
  S  1 1.00
    0.9056610000 1.0000000000
  S  1 1.00
    0.2556110000 1.0000000000
  S  1 1.00
    0.0845000000 1.0000000000
  P  3 1.00
    42.1175000000 0.0365114000
    9.6283700000 0.2371530000
    2.8533200000 0.8197020000
  P  1 1.00
    0.9056610000 1.0000000000
  P  1 1.00
    0.2556110000 1.0000000000
  P  1 1.00
    0.0845000000 1.0000000000
  D  1 1.00
    2.5840000000 1.0000000000
  D  1 1.00
    0.6460000000 1.0000000000
  F  1 1.00
    1.4000000000 1.0000000000
 ****
 I hope this helps....
 Steve Williams
 ----- Original Message -----
 > From: "Pavle Mocilac pavle.mocilac2{}mail.dcu.ie"
 <owner-chemistry||ccl.net>
 Date: Thursday, July 9, 2009 9:42 pm
 Subject: CCL:G: CBS-Q method in druglike molecules
 To: "Williams, Steve " <willsd||conrad.appstate.edu>
 >
 > Sent to CCL by: Pavle Mocilac [pavle.mocilac2()mail.dcu.ie]
 > Dear Steve,
 >
 > Thank you for this nice mail, you give me some new useful information.
 > As I mentioned im my previous mail, I'm using 64 bit machine and 64
 > bit UNIX operating system (Silicon Graphics
 > Altix ICE 8200XE). I can not improve my hardware, and I'm quite sure
 > that the better hardware I can not get.     The memory I'm allocating
 > is 256MW and I am using 8 processors - it is a maximum regarding the
 > number of processor power.
 > Therefore, I'm using quite awesome hardware/software performances.
 >
 > Do you think that QCISD(T) method will be possible to replace with
 > QCISD method? How to specify T exclusion in QCISD(T) step of CBS-Q
 > protocol using keywords?
 > Regarding the scaling the molecule, I was developing a model using the
 > same molecules described in reference paper, aminopyridines (13 atoms)
 > and for them, the calculation lasts up to 45 minutes maximum,
 > including in a solvent conditions.
 > But my drug-like molecules contain 25 atoms, but in that case,
 > calculation goes to more than 3 days. In fact, as I have limited time
 > for one job I have to say that I never finished.
 > In the paper I'm using as a reference G1 level was used and they
 > got nice correlation (pKa{calc} vs
 > pKa{exp}). My idea, using CBS-Q, was, in fact, to obtain even
 > better model...
 > Maybe to use CBS-4M or to try G1? Although mistakes are supposed to
 > be bigger...
 >
 >
 > Best regards,
 >
 > Pavle Mocilac
 >
 > ============================================
 > Pavle Mocilac
 > Postgraduate Researcher
 > T3 - Targeted Therapeutics and Theranostics
 > Room X249, School of Chemistry
 > Dublin City University
 > Dublin 9, Dublin, Ireland
 > mobile: +353872167022
 > mailto:pavle.mocilac2- -mail.dcu.ie
 > ============================================
 > Thursday, July 9, 2009, 8:54:58 PM, you wrote:
 >
 >
 > > Sent to CCL by: [willsd*appstate.edu]
 > > Here is my $0.02...
 > > You would like to apply a method based on CBS-Q energies to
 > estimate pKa values
 > > for molecules with as many as 25 atoms, and the QCISD(T)/6-
 > 31+G(d) step is
 > > taking a long time.  You have G09 and would like to do this
 > calculation with
 > > this software, so as to come as close as possible to reproducing
 > the methods
 > > used for similar calculations in a reference paper...
 >
 > > You have not mentioned this, but you may run into disk limitation
 > problems...> this is quite likely with a 32 bit version on G09,
 > since the 32 bit version can
 > > address only a limited amount of disk, and QCISD(T) will need a
 > lot of disk.  (I
 > > should mention that I have not used G09... but previous versions
 > of Gxx had this
 > > limitation on the 32 bit versions, and I'd guess that 32 bit G09
 > does too.) With
 > > the 64 bit version, running on a 64 bit machine with a 64 bit OS,
 > you will run
 > > out of physical disk long before you run out address space.  So
 > one repliers
 > > comments about 32 bit vs 64 bit is quite pertinent to this
 > calculation.
 > > The CBS methods (and other high accuracy combined methods (G1,
 > G2, G3,
 > > multi-coefficient G3, ....) all try, by various methods, to get
 > the energy
 > > accuracy of a very large basis set combined with a very accurate
 > energy method,
 > > without actually doing such a calculation, since the required
 > computing> resources exceed what anyone has.  All such methods
 > depend on various post-HF
 > > energy calculations, and these depend on computing the effects on
 > the energy of
 > > substituting electrons from occupied (in the underlying HF
 > wavefunction)> molecular orbitals into higher energy, unoccupied
 > orbitals.  The SD in QCISD(T)
 > > means that all possible subsitutions of single electrons, and
 > pairs of electrons
 > > are included in a self-consistent manner, and the (T) means that
 > once the SD
 > > part has converged, the effect of substituting three electrons
 > (hence triples)
 > > is estimated with perturbation theory.  The cost of such a
 > calculation is
 > > roughly proportional to the sixth power of the number of basis
 > functions (this
 > > is included in the G09 output), and hence proportional to the
 > sixth power of the
 > > number of atoms.  The CBS method (and G1, G2....) has other steps
 > as well, and
 > > this will have other scaling behavior, so the overall scaling for
 > the entire
 > > CBS-Q method is not easy to predict.
 > > One replyer suggested replacing the QCISD(T) step with QCISD...
 > this can be made
 > > to give a possibly useful method, but it will not be CBS-Q, and
 > will not be
 > > comparable to your reference paper.
 >
 > > I'd suggest trying a scaled down version of your drug-like
 > molecule (replace
 > > methyl or ethyl groups with H, etc), and see if you can get the
 > atom count down
 > > to 12 or so.  Do the CBS-Q calculation on this version, note the
 > time required,
 > > add a methyl or ethyl back in, repeat the CBS-Q and timing.  You
 > can then
 > > estimate the scaling, and use this to estimate how long your full
 > 25 atom
 > > molecule might take.  My rule of thumb is that (except in unusual
 > circumstances)> if it is going to take longer than a week, I will
 > have forgotten what I wanted
 > > to calculate before it is done, and I am not interested.
 >
 > > For some quite clear discussion on the calculations that lie
 > inside combined
 > > methods like CBS-Q, Frank Jensen's relatively new book
 > (Introduction to
 > > Computational Chemistry) is pretty good. (IMHO, it is an
 > introduction in the
 > > sense that he introduces nearly all of the relevant equations,
 > not in the sense
 > > of simplified.  I appreciate his approach a great deal.)
 >
 > > Good luck on your hunt for pKa estimates...
 >
 > > Steve Williams
 >
 >
 > > ----- Original Message -----
 > >> From: "Pavle Mocilac pavle.mocilac2^-^mail.dcu.ie"
 <owner-
 > chemistry_._ccl.net>> Date: Thursday, July 9, 2009 10:03 am
 > > Subject: CCL:G: CBS-Q method in druglike molecules
 > > To: "Williams, Steve " <willsd_._conrad.appstate.edu>
 >
 > >>
 > >> Sent to CCL by: Pavle Mocilac [pavle.mocilac2===mail.dcu.ie]
 > >> Dear Simon,
 > >>
 > >> Thank you for the information about this software, but no, I
 > have
 > >> not and will not consider any other methods beside those that
 > can
 > >> be made with Gaussian 03 or 09 and described in "CaballeroNA,
 > >> Melendez FJ, Munoz-Cara C, et al.   Biophys. Chem  124  (2)
 > >> 155-160".
 > >> We do not have any intention to buy any new additional software
 > but
 > >> to use the one we have. Is seems to me that there are numerous
 > >> people on this CCL  that behave as salesman and always want to
 > >> solve my problem by selling
 > >> me some additional software.
 > >> The reason we are using Gaussian is deeper, and for the sake of
 > >> consistency and comparability we do not have attention to use so
 > >> many different programmes that works in different way.
 > >> I would like to repeat that CBS-Q is my main problem and
 > therefore
 > >> I will be very happy and very grateful if there is somebody in
 > this
 > >> CCL who can help me with CBS-Q calculations in Gaussian.
 > >>
 > >>
 > >> Best regards,
 > >>
 > >> Pavle Mocilac
 > >>
 > >> ============================================
 > >> Pavle Mocilac
 > >> Postgraduate Researcher
 > >> T3 - Targeted Therapeutics and Theranostics
 > >> Room X249, School of Chemistry
 > >> Dublin City University
 > >> Dublin 9, Dublin, Ireland
 > >> mobile: +353872167022
 > >> mailto:pavle.mocilac2~!~mail.dcu.ie
 > >> ============================================
 > >> Wednesday, July 8, 2009, 7:21:16 PM, you wrote:
 > >>
 > >>
 > >> > Sent to CCL by: Simon Cross [simon!=!moldiscovery.com]
 > >>
 > >> > Hi Pavle, have you considered other methods for pKa
 > calculation
 > >> of
 > >> > drug-like molecules? We recently published a method
 (Milletti,
 > F.
 > >> et al.
 > >> > (2007) New and Original pKa Prediction Method Using Grid
 > >> Molecular
 > >> > Interaction Fields. J. Chem. Inf. Model., 47, 2172-2181)
 which
 > is
 > >> used> in our software MoKa.
 > >>
 > >> > You can see more about it at
 www.moldiscovery.com/soft_moka.php
 > >>
 > >> > We get an RMSE of 0.4 log units for the training set of 26000
 > >> data
 > >> > points (very close to experimental error) and external
 > prediction
 > >> error> of 0.7 log units.
 > >>
 > >> > Simon
 > >>
 > >> > Pavle Mocilac pavle.mocilac2]|[mail.dcu.ie wrote:
 > >> >> Sent to CCL by: "Pavle  Mocilac"
 [pavle.mocilac2{:}mail.dcu.ie]
 > >> >> My dear colleagues,
 > >> >>
 > >> >> Recently I sent the question about the calculation of pKa
 of
 > >> small druglike molecules. The method asks employing high
 > accuracy
 > >> models like G1, G2, CBS-Q in order to get accurate Free Gibbs
 > >> energies using Gaussian 09, of course..
 > >> >> Now, it seems to me that this model works OK, and fast
 enough
 > >> for very small molecules up to 10 or 12 atoms (hydrogen
 > included).
 > >> But, my molecule has 25 atoms and currently is stacked at
 > >> QCISD(T)/6-31+G(d') step, calculating something called
 "triples"
 > >> for more than 2 days!!!!!
 > >> >> I would like to emphasize that this job is being
 calculating
 > >> with Gaussian 09 (Linux version), using lots of memory () and 8
 > >> processors at SGI Altix ICE 8200EX.
 > >> >> What is the limit of CBS-Q method regarding the number of
 > atoms?
 > >> Is 25 atoms too much? Is there any option to use some kind of
 > >> additional keywords to exclude "triples" calculation at
 > QCISD(T)/6-
 > >> 31+G(d') step? Will that exclusion hamper accurate result? Is it
 > >> better to use CBS-4M?>
 > >> >>
 > >> >>
 > >>
 > >>
 > >>
 > >> -= This is automatically added to each message by the mailing
 > >> script =-
 > >> To recover the email address of the author of the message,
 > please
 > >> changethe strange characters on the top line to the _._ sign.
 > You can
 > >> also> Conferences:
 > >> http://server.ccl.net/chemistry/announcements/conferences/>;
 > Conferences:>
 > http://server.ccl.net/chemistry/announcements/conferences/
 >
 > -= This is automatically added to each message by the mailing
 > script =-
 > To recover the email address of the author of the message, please
 > changethe strange characters on the top line to the || sign. You can
 > also> Conferences:
 > http://server.ccl.net/chemistry/announcements/conferences/>;
 >
 >