From owner-chemistry@ccl.net Wed Oct 27 02:51:00 2010 From: "Stan van Gisbergen vangisbergen~~scm.com" To: CCL Subject: CCL: Energy partitioning scheme ADF - Electrostatic interaction Message-Id: <-43011-101027024942-662-cbXAvwQ4KVSSmkz4khgnVA|*|server.ccl.net> X-Original-From: Stan van Gisbergen Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Date: Wed, 27 Oct 2010 08:50:13 +0200 Mime-Version: 1.0 (Apple Message framework v753.1) Sent to CCL by: Stan van Gisbergen [vangisbergen^^^scm.com] Dear Dr. Azpiroz, Please follow ADF-GUI tutorial 8 on ADF's unique fragment method: http://www.scm.com/Doc/Doc2010.01/ADFGUI/ADFGUI_tutorial/page123.html and text example on bond energy decomposition on this page: http://www.scm.com/Doc/Doc2010.01/ADF/Examples/page127.html If these do not answer your questions, please send a message to SCM's support E-mail address. Best regards, Stan van Gisbergen On Oct 15, 2010, at 5:43 PM, Jon Mikel Azpiroz jmkimteo{=}hotmail.com wrote: > > Sent to CCL by: "Jon Mikel Azpiroz" [jmkimteo:-:hotmail.com] > Dear members of CCL, > > I am trying to reproduce the results given by the energy > partitioning scheme of ADF for a system in which 2 fragments have > been defined. In particular, I am trying to reproduce the > electrostatic interaction, which is calculated "between the > unperturbed charge distributions of the prepared fragments as they > are brought together at their final positions". Consequently, I > have taken the multipole derived charges (MCD-q) and computed the > electrostatic interaction as the sum of the classical coulombic > interactions between the atoms of fragment1 and fragment2. The > electrostatic energies I get are 10 times smaller that the ones > given by the program. > > Even if I have approximate the atoms as puntual charges, I suspect > that the agreement should be better. > > Does anyone know the way in which ADF calculates the electrostatic > interaction? Could anyone give an advice? > > Thank you in advance for your attention. > > Regards. > > Jon Mikel Azpiroz > University of the Basque Country > > > > -= This is automatically added to each message by the mailing > script =- > To recover the email address of the author of the message, please > change> Conferences: http://server.ccl.net/chemistry/announcements/ > conferences/> > From owner-chemistry@ccl.net Wed Oct 27 08:44:00 2010 From: "Andras Borosy andras.borosy::givaudan.com" To: CCL Subject: CCL: Descriptors Message-Id: <-43012-101027060202-7547-8MtSGoS+eigd+zJ5jcvGYQ^^^server.ccl.net> X-Original-From: Andras Borosy Content-Type: multipart/alternative; boundary="=_alternative 003716ADC12577C9_=" Date: Wed, 27 Oct 2010 12:01:50 +0200 MIME-Version: 1.0 Sent to CCL by: Andras Borosy [andras.borosy:+:givaudan.com] This is a multipart message in MIME format. --=_alternative 003716ADC12577C9_= Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Dear George, If I am not mistaken, you have MOE. Well, then use the ga.svl! It is very=20 effective genetic algorithm and selects several good equations. Generally=20 the first one (with least LOF) is the best one. On the other hand you must = know the experimental error of your dependent variable (Y values,=20 properties, activities), before you would start any QSPR building!=20 Best wishes, Dr. Andr=E1s P=E9ter Borosy Scientific Modelling Expert Fragrance Research Givaudan Schweiz AG - Ueberlandstrasse 138 - CH-8600 - D=FCbendorf -= =20 Switzerland T:+41-44-824 2164 - F:+41-44-8242926 - http://www.givaudan.com "Erik-Jan Ras Erik-Jan.Ras..avantium.com" =20 Sent by: owner-chemistry+andras.borosy=3D=3Dgivaudan.com[-]ccl.net 26.10.2010 20:23 Please respond to "CCL Subscribers" To "Borosy, Andras " cc Subject CCL: Descriptors Sent to CCL by: Erik-Jan Ras [Erik-Jan.Ras~~avantium.com] Dear George, As already indicated by others, there is no uniform selection method for=20 choosing which descriptors to use. Some guidelines, depending on the=20 modeling method you use may still be helpfull. If you're using PLS models, a good starting point is the variable=20 importance (VIP) for each of the variables in your model. A variable with=20 a high VIP will have a high impact on your model performance. Typically=20 you start your modeling exercise with all available variables. After that, = in small iterative steps, you reduce your model. At each stage you have to = carefully evaluate predictive power of your model. Ideally you would use a = substantially large external validation set to assess predictive power. Also keep in mind the fact that per response (Y) in theory only one latent = variable should be required in your model. If (many) more latent variables = are required you're dealing with variations in your descriptor space (X)=20 that are orthogonal (uncorrelated) to your response (Y). In this case you=20 may want to consider using OPLS in stead of PLS. Generally speaking, these methods are implemented in commercial packages=20 like Simca-P and work quite well (also pretty well documented and=20 referenced). With a bit more effort in environments like Matlab, Scilab or = R many open source libraries are available as well. Regards, Erik-Jan =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F > From: owner-chemistry+erikjan.ras=3D=3Davantium.com=5F-=5Fccl.net=20 [owner-chemistry+erikjan.ras=3D=3Davantium.com=5F-=5Fccl.net] On Behalf Of = George=20 Lawrence geoe2##hotmail.com [owner-chemistry=5F-=5Fccl.net] Sent: Tuesday, October 26, 2010 12:51 PM To: Erik-Jan Ras Subject: CCL: Descriptors Sent to CCL by: "George Lawrence" [geoe2%hotmail.com] While building a model for a set of compounds, how does one make the=20 choice of molecular descriptors, I am using MOE which has about 333=20 different descriptors. I noticed that some have the same suffix or prefix. For example: GCUT (could be SlogP, SMR or PEOE) and then there is SlogP=5F = vsa, SMR=5F vsa, PEOE=5Fvsa which have different numbers attach to them. Wh= at=20 does this mean? Do they describe the same thing? How does the numbers relate to each=20 descriptor? What are the best methods to use to decide the right choice of=20 descriptors? George Lawrence Geoe2[a]hotmail.com Kent=20 U.K.http://www.ccl.net/cgi-bin/ccl/send=5Fccl=5Fmessagehttp://www.ccl.net/c= hemistry/sub=5Funsub.shtmlhttp://www.ccl.net/spammers.txtThis=20 email (including its attached files and other content) is confidential and = intended only for the use by named addressee. Unauthorized use,=20 dissemination, disclosure and/or copying are prohibited. This email,=20 attachments and (any part of) its content are (1) intended for the named=20 addressee(s) only, and (2) strictly confidential and proprietary. All=20 rights are reserved by Avantium Holding B.V. and its subsidiaries=20 ('Avantium'). Any unauthorized use, dissemination, disclosure and/or=20 copying is strictly prohibited, except after prior and express written=20 permission by Avantium. Avantium is not responsible for the correct=20 transmission and timely receipt of this email and its content. Should you=20 have received this email, attachments and its content by mistake, please=20 bring this to our attention and destroy this email in full. Thank you.=20 http://www! .avantium.com/about/legal-disclaimer/ -=3D This is automatically added to each message by the mailing script =3D-http://www.ccl.net/cgi-bin/ccl/send=5Fccl=5Fmessagehttp://www.ccl.net/cgi-bin/ccl/send=5Fccl=5Fmessage Subscribe/Unsubscribe:=20 http://www.ccl.net/chemistry/sub=5Funsub.shtmlJob: http://www.ccl.net/jobs=20http://www.ccl.net/spammers.txt--=_alternative 003716ADC12577C9_= Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable
Dear George,

If I am not mistaken, you have MOE. Well, then use the ga.svl! It is very effective genetic algorithm and selec= ts several good equations. Generally the first one (with least LOF) is the best one. On the other hand you must know the experimental error of your dependent variable (Y values, properties, activities), before you would start any QSPR building!

Best wishes,

Dr. Andr=E1s P=E9ter Borosy
Scientific Modelling Expert

Fragrance Research
Givaudan Schweiz AG  -  Ueberlandstrasse 138  -  CH-8600  -  D=FCbendorf  -  Switzerland
T:+41-44-824 2164  -  F:+41-44-8242926    -  http:= //www.givaudan.com




"Erik-Jan Ras Er= ik-Jan.Ras..avantium.com" <owner-chemistry[-]ccl.net>
Sent by: owner-chemistry+andras.boro= sy=3D=3Dgivaudan.com[-]ccl.net

26.10.2010 20:23
Please respond to
"CCL Subscribers" <chemistry[-]ccl.net>

To
"Borosy, Andras " <andras.borosy[-]givaudan.com>
cc
Subject
CCL: Descriptors






Sent to CCL by: Erik-Jan Ras [Erik-Jan.Ras~~avantium.com]
Dear George,

As already indicated by others, there is no uniform selection method for choosing which descriptors to use. Some guidelines, depending on the modeli= ng method you use may still be helpfull.

If you're using PLS models, a good starting point is the variable importance (VIP) for each of the variables in your model. A variable with a high VIP will have a high impact on your model performance. Typically you start your modeling exercise with all available variables. After that, in small iterative steps, you reduce your model. At each stage you have to carefully evaluate predictive power of your model. Ideally you would use a substantia= lly large external validation set to assess predictive power.

Also keep in mind the fact that per response (Y) in theory only one latent variable should be required in your model. If (many) more latent variables are required you're dealing with variations in your descriptor space (X) that are orthogonal (uncorrelated) to your response (Y). In this case you may want to consider using OPLS in stead of PLS.

Generally speaking, these methods are implemented in commercial packages like Simca-P and work quite well (also pretty well documented and reference= d). With a bit more effort in environments like Matlab, Scilab or R many open source libraries are available as well.

Regards,
Erik-Jan


=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
> From: owner-chemistry+erikjan.ras=3D=3Davantium.com=5F-=5Fccl.net [own= er-chemistry+erikjan.ras=3D=3Davantium.com=5F-=5Fccl.net] On Behalf Of George Lawrence geoe2##hotmail.com [owner-chemistry=5F-=5Fccl.= net]
Sent: Tuesday, October 26, 2010 12:51 PM
To: Erik-Jan Ras
Subject: CCL: Descriptors

Sent to CCL by: "George  Lawrence" [geoe2%hotmail.com]
While building a model for a set of compounds, how does one make the choice of molecular descriptors, I am using MOE which has about 333 different descriptors. I noticed that some have the same suffix or prefix.
For example: GCUT (could be SlogP, SMR or PEOE) and then there is SlogP=5F vsa, SMR=5F vsa, PEOE=5Fvsa which have different numbers attach to them. Wh= at does this mean?

Do they describe the same thing? How does the numbers relate to each descr= iptor?
What are the best methods to use to decide the right choice of descriptors?=

George Lawrence
Geoe2[a]hotmail.com
Kent U.K.http://www.ccl.net/cgi-bin/ccl/send=5Fccl=5Fmessagehttp://www.ccl.= net/chemistry/sub=5Funsub.shtmlhttp://www.ccl.net/spammers.txtThis email (including its attached files and other content) is confidential and intended only for the use by named addressee. Unauthorized use, dissemi= nation, disclosure and/or copying are prohibited. This email, attachments and (any part of) its content are (1) intended for the named addressee(s) only, and (2) strictly confidential and proprietary. All rights are reserved by Avantium Holding B.V. and its subsidiaries ('Avantium'). Any unauthorized use, dissemination, disclosure and/or copying is strictly prohibited, except after prior and express written permission by Avantium. Avantium is not responsible for the correct transmission and timely receipt of this email and its content. Should you have received this email, attachments and its content by mistake, please bring this to our attention and destroy this email in full. Thank you. http://www!
.avantium.com/about/legal-disclaimer/



-=3D This is automatically added to each message by the mailing script =3D-=      http://www.ccl.net/cgi-bin/ccl/send=5Fccl=5Fmessage
     http://www.ccl.net/cgi-bin/ccl/send=5Fccl=5Fmessage
     http://www.ccl.net/chemistry/sub=5Funsub.shtml
     http://www.ccl.net/spammers.txt



--=_alternative 003716ADC12577C9_=-- From owner-chemistry@ccl.net Wed Oct 27 09:19:00 2010 From: "James Robinson jameschums]=[yahoo.com" To: CCL Subject: CCL: Descriptors for MOE Message-Id: <-43013-101027052237-12307-FKO5Y/sSYtKfNpHpx/lWTw(~)server.ccl.net> X-Original-From: "James Robinson" Date: Wed, 27 Oct 2010 05:22:36 -0400 Sent to CCL by: "James Robinson" [jameschums]|[yahoo.com] Dear George Lawrence et al, SlogP, SMR and PEOE descriptors relate to charge properties, logP and electrostatic/charge properties. The decomposition of the properties into descriptors is not that easy to explain, I tended to consider them as being expanded into a 'PCA', to then try to understand what they mean in terms of physical properties is difficult and perhaps not really what the descriptors were meant for; they are just numbers. It however is useful if one has a virtual library in MOE, you can select candidates with similar numerical values for descriptors. Please remember that even in MOE its applied maths, computer code and graphics, not physical science. Hope this helps. Dr J J Robinson, Somerset, UK. ------ While building a model for a set of compounds, how does one make the choice of molecular descriptors, I am using MOE which has about 333 different descriptors. I noticed that some have the same suffix or prefix. For example: GCUT (could be SlogP, SMR or PEOE) and then there is SlogP_ vsa, SMR_ vsa, PEOE_vsa which have different numbers attach to them. What does this mean? Do they describe the same thing? How does the numbers relate to each descriptor? What are the best methods to use to decide the right choice of descriptors? George Lawrence From owner-chemistry@ccl.net Wed Oct 27 09:53:00 2010 From: "Grigoriy Zhurko reg_zhurko[A]chemcraftprog.com" To: CCL Subject: CCL:G: How to compute a porphyrin with Zn Message-Id: <-43014-101027065220-10506-asPbM5nngXjsS4BM1QjWLQ-#-server.ccl.net> X-Original-From: Grigoriy Zhurko Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Date: Wed, 27 Oct 2010 14:51:57 -0700 MIME-Version: 1.0 Sent to CCL by: Grigoriy Zhurko [reg_zhurko,+,chemcraftprog.com] Dear All, I need to compute metalloporhyrin complex (non-substituted or substituted porphyrin with Zn and possibly other metals (Cu, Ni, etc)) with Gaussian or PCGamess. Currently I am using B3LYP/6-31G(D,P) method. I am afraid that this method is not appropriate for such molecules: the 6-31G(D,P) basis set does not correctly describe the metal atom. I want to use an ECP on Zn. The main question is: what ECP can be used in conjunction with the 6-31G(D,P) basis set on other atoms (how to exclude the BSSE)?. I cannot use bigger basis sets because I run the jobs on a single PC under Windows. Grigoriy Zhurko. From owner-chemistry@ccl.net Wed Oct 27 10:29:00 2010 From: "Marcel Swart marcel.swart++icrea.cat" To: CCL Subject: CCL:G: How to compute a porphyrin with Zn Message-Id: <-43015-101027102652-12659-w9w9wAyG2ckNsbf69kEQBQ-,-server.ccl.net> X-Original-From: Marcel Swart Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=iso-8859-1 Date: Wed, 27 Oct 2010 16:26:39 +0200 Mime-Version: 1.0 (Apple Message framework v1081) Sent to CCL by: Marcel Swart [marcel.swart_._icrea.cat] Dear Grigoriy, please have a look at: M. Swart, M. Güell, J.M. Luis and M. Solŕ "Spin-state corrected Gaussian-type orbital basis sets" J. Phys. Chem. A 2010, 114, 7191-7197 http://dx.doi.org/10.1021/jp102712z (which are available from http://bse.pnl.gov/bse) and at the following papers: F. Feixas, M. Solŕ and M. Swart "Chemical bonding and aromaticity in metalloporphyrins" Can. J. Chem. (Ziegler issue) 2009, 87, 1063-1073 http://dx.doi.org/10.1139/V09-037 (if you don't have access, please ask me for a reprint) M. Güell, J.M. Luis, M. Solŕ and M. Swart "Importance of the basis set for the spin-state energetics of iron complexes" J. Phys. Chem. A 2008, 112, 6384-6391 http://dx.doi.org/10.1021/jp803441m Marcel On Oct 27, 2010, at 11:51 PM, Grigoriy Zhurko reg_zhurko[A]chemcraftprog.com wrote: > Dear All, > I need to compute metalloporhyrin complex (non-substituted or substituted porphyrin with Zn and possibly other metals (Cu, Ni, etc)) with Gaussian or PCGamess. Currently I am using B3LYP/6-31G(D,P) method. I am afraid that this method is not appropriate for such molecules: the 6-31G(D,P) basis set does not correctly describe the metal atom. I want to use an ECP on Zn. The main question is: what ECP can be used in conjunction with the 6-31G(D,P) basis set on other atoms (how to exclude the BSSE)?. I cannot use bigger basis sets because I run the jobs on a single PC under Windows. > Grigoriy Zhurko. =================================== dr. Marcel Swart ICREA Research Professor at Institut de Química Computacional Universitat de Girona Parc Científic i Tecnolňgic Edifici Jaume Casademont (despatx A-27) Pic de Peguera 15 17003 Girona Catalunya (Spain) tel +34-972-183240 fax +34-972-183241 e-mail marcel.swart ~ icrea.cat marcel.swart ~ udg.edu web http://www.marcelswart.eu =================================== From owner-chemistry@ccl.net Wed Oct 27 11:51:00 2010 From: "Lara Martinez Fernandez lara.martinez * uam.es" To: CCL Subject: CCL:G: Double hybrid functionals for excited states with Gaussian09 Message-Id: <-43016-101027110829-25155-7rtQ2Gr4JTnCcVrw8RVEAg[-]server.ccl.net> X-Original-From: "Lara Martinez Fernandez" Date: Wed, 27 Oct 2010 11:08:27 -0400 Sent to CCL by: "Lara Martinez Fernandez" [lara.martinez*o*uam.es] I am trying to run some TD-DFT calculation (for excited states) using double hybrid functionals with Gaussian09. I have some doubts about how TD-B2PLYP,TD-B2GP-PLYP work. If I dont misunderstood, first I have to run a B2LYP calculation for the ground state and then use this orbitals and amplitudes to run the CIS(D) correction for the rest of excited states. I dont know if this procedure is possible with Gaussian09. Do anyone know how to do that? I guess there must be any keyword to read the B2LYP (of the ground state) orbitals and not optimise them in the CIS(D) calculation. Thanks in advance, Lara From owner-chemistry@ccl.net Wed Oct 27 12:53:00 2010 From: "Taye D BD sene3095%x%yahoo.com" To: CCL Subject: CCL: DALTON Message-Id: <-43017-101027111823-17478-MymRVMRwFi8gHCaco9PvAQ^-^server.ccl.net> X-Original-From: "Taye D BD" Date: Wed, 27 Oct 2010 11:18:21 -0400 Sent to CCL by: "Taye D BD" [sene3095__yahoo.com] Hi All, I am running NMR calculations using DALTOM program. It run until it calculates the chemical shieldings and it stops when it starts spin-spin coupling constants. Is the any body who ever used DALTON for such calculations? I really appreciate if you share me your experiences. From owner-chemistry@ccl.net Wed Oct 27 14:06:00 2010 From: "Raul Alvarez ralvarez{=}chemcomp.com" To: CCL Subject: CCL: CCG - North American User Group Meeting 2011 Message-Id: <-43018-101027122651-8711-zDlhMGIWwsZYukEW8qS/4Q*server.ccl.net> X-Original-From: "Raul Alvarez" Content-Language: en-us Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" Date: Wed, 27 Oct 2010 12:26:37 -0400 MIME-Version: 1.0 Sent to CCL by: "Raul Alvarez" [ralvarez!=!chemcomp.com] CCG - North American User Group Meeting 2011 June 20-23, 2011 Montreal, Canada CHEMICAL COMPUTING GROUP is pleased to announce that the next year's North American User Group Meeting will take place in Montreal, Canada on June 20-23, 2011. The venue selected for this event was Le Saint Sulpice Hotel located in the heart of old Montreal. The meeting will consist of 2 days of training sessions and 2 days of scientific presentations from various guest speakers. Training sessions will be scheduled on June 20-21 followed by scientific presentations and posters on June 22-23. More details will be announced in the next few weeks! Please visit http://www.chemcomp.com/ugm-2011.htm for future updates. We look forward to welcoming you at this event in Montreal, Raul Alvarez ralvarez|chemcomp.com www.chemcomp.com Phone: (514) 393-1055 Fax: (514) 874-9538 From owner-chemistry@ccl.net Wed Oct 27 14:41:00 2010 From: "Isaac B Bersuker bersuker=-=cm.utexas.edu" To: CCL Subject: CCL: Descriptors Message-Id: <-43019-101027121218-21394-JNfsBtS0GKXNRP8Fscb4Qg---server.ccl.net> X-Original-From: Isaac B Bersuker Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 Date: Wed, 27 Oct 2010 16:12:07 +0000 (UTC) MIME-Version: 1.0 Sent to CCL by: Isaac B Bersuker [bersuker~~cm.utexas.edu] I should like to support the statement below by Andreas Klamt and call your attention to, e.g., my paper entitled "QSAR without arbitrary descriptors..." in J. Comput. Aided Mol. Des. (2008) 22:423. Dr. Isaac B. Bersuker Institute for Theoretical Chemistry The University of Texas at Austin Chem & Biochem Department 1 University Station A5300 Austin, TX 78712-0165 Phone: (512) 471-4671; Fax: (512) 471-8696 E-mail: bersuker*cm.utexas.edu http://www.cm.utexas.edu/isaac_bersuker ----- Original Message ----- > From: "Andreas Klamt klamt~~cosmologic.de" To: "Isaac B. Bersuker" Sent: Tuesday, October 26, 2010 3:26:51 PM Subject: CCL: Descriptors Sent to CCL by: Andreas Klamt [klamt~~cosmologic.de] Dear George, I like to send a kind of warning: The large number of molecular descriptors which nowadays are easily made available by some programs also provide a kind of danger. If you have thousands of descriptors available for a property for which you may have lets say 50 exp. data, then the chance that some of them correlate just accidentally is quite high. If they correlate accidentally, no statistical method will detect that the correlation is accidental. Therefore I strongly recommend that you first decide rationally whether a descriptor may have any reasonable relation to the target property. There are few criteria wich can be used: If you want to describe a local property of a molecule, maybe a certain reactivity of a functional group, do not use global molecular descriptors, because they cannot be the right descriptors. Vice versa, do not use local descriptors for global properties (e.g. a logP). Do not use orbital descriptors when you want to describe molecular mobility/viscosity, diffusion coefficients, ..) Best use a small set of descriptors which is known to include the relevant information, e.g. for any kind of log-partition coefficient you may either use the 5 Abraham descriptors or the 5 COSMO-RS sigma-moments. ... Blind QSAR based on large numbers of descriptors just selected by sophisticated statistical methods will lead to QSAR equations, which look significant, but most often include no physics. They will fail as soon as you apply them to a novel situation. Best regards Andreas Am 26.10.2010 20:23, schrieb Erik-Jan Ras Erik-Jan.Ras..avantium.com: > Sent to CCL by: Erik-Jan Ras [Erik-Jan.Ras~~avantium.com] > Dear George, > > As already indicated by others, there is no uniform selection method for choosing which descriptors to use. Some guidelines, depending on the modeling method you use may still be helpfull. > > If you're using PLS models, a good starting point is the variable importance (VIP) for each of the variables in your model. A variable with a high VIP will have a high impact on your model performance. Typically you start your modeling exercise with all available variables. After that, in small iterative steps, you reduce your model. At each stage you have to carefully evaluate predictive power of your model. Ideally you would use asubstantially large external validation set to assess predictive power. > > Also keep in mind the fact that per response (Y) in theory only one latent variable should be required in your model. If (many) more latent variables are required you're dealing with variations in your descriptor space (X) that are orthogonal (uncorrelated) to your response (Y). In this case you may want to consider using OPLS in stead of PLS. > > Generally speaking, these methods are implemented in commercial packages like Simca-P and work quite well (also pretty well documented and referenced). With a bit more effort in environments like Matlab, Scilab or R many open source libraries are available as well. > > Regards, > Erik-Jan > > > ________________________________________ >> From: owner-chemistry+erikjan.ras==avantium.com_-_ccl.net [owner-chemistry+erikjan.ras==avantium.com_-_ccl.net] On Behalf Of George Lawrence geoe2##hotmail.com [owner-chemistry_-_ccl.net] > Sent: Tuesday, October 26, 2010 12:51 PM > To: Erik-Jan Ras > Subject: CCL: Descriptors > > Sent to CCL by: "George Lawrence" [geoe2%hotmail.com] > While building a model for a set of compounds, how does one make the choice of molecular descriptors, I am using MOE which has about 333 different descriptors. I noticed that some have the same suffix or prefix. > For example: GCUT (could be SlogP, SMR or PEOE) and then there is SlogP_ vsa, SMR_ vsa, PEOE_vsa which have different numbers attach to them. What does this mean? > > Do they describe the same thing? How does the numbers relate to each descriptor? > What are the best methods to use to decide the right choice of descriptors? > > George Lawrence > Geoe2[a]hotmail.com > Kent U.K.http://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txtThis email (including its attached files and other content) is confidential and intended only for the use by named addressee. Unauthorized use, dissemination, disclosure and/or copying are prohibited. This email, attachments and (any part of) its content are (1) intended for the named addressee(s) only, and (2) strictly confidential and proprietary. All rights are reserved byAvantium Holding B.V. and its subsidiaries ('Avantium'). Any unauthorized use, dissemination, disclosure and/or copying is strictly prohibited, except after prior and express written permission by Avantium. Avantium isnot responsible for the correct transmission and timely receipt of this email and its content. Should you have received this email, attachments and its content by mistake, please bring this to our attention and destroythis email in full. Thank you. http://www! > .avantium.com/about/legal-disclaimer/> > > -- PD. Dr. Andreas Klamt CEO / GeschäftsfĂĽhrer COSMOlogic GmbH& Co. KG Burscheider Strasse 515 D-51381 Leverkusen, Germany phone +49-2171-731681 fax +49-2171-731689 e-mail klamt---cosmologic.de web www.cosmologic.de HRA 20653 Amtsgericht Koeln, GF: Dr. Andreas Klamt Komplementaer: COSMOlogic Verwaltungs GmbH HRB 49501 Amtsgericht Koeln, GF: Dr. Andreas Klamthttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt From owner-chemistry@ccl.net Wed Oct 27 16:57:00 2010 From: "Leonardo Moreira da Costa leomdcosta#%#yahoo.com.br" To: CCL Subject: CCL:G: MP2 Calculation Message-Id: <-43020-101027162820-28814-i39Kd5k4sDCNONcpxKeGKg[*]server.ccl.net> X-Original-From: "Leonardo Moreira da Costa" Date: Wed, 27 Oct 2010 16:28:19 -0400 Sent to CCL by: "Leonardo Moreira da Costa" [leomdcosta a yahoo.com.br] I have a very simple doubt. It is said in the Gaussian help (C:\G03W\help\k_maxdisk.htm), about maxDisk, that MP2 energies and gradients obey MaxDisk, which must be at least 2ON2. It is said that, but they do not specify what are the ''O'' and ''N'' parameters of the formula. Anyone can, please, explain me what are these two parameters? I have looked for it in many resources, but I did not obatin a certain meaning (number of the electrons, orbitals, atoms, ...). Thanks a lot! Leonardo Costa PhD Student from UFF (RJ, Brazil) leomdcosta+*+yahoo.com.br From owner-chemistry@ccl.net Wed Oct 27 17:31:00 2010 From: "N. Sukumar nagams:_:rpi.edu" To: CCL Subject: CCL: Descriptors Message-Id: <-43021-101027164757-8811-xMFnDSYB5PWkSHn8UzDnNw]=[server.ccl.net> X-Original-From: "N. Sukumar" Content-Disposition: inline Content-Transfer-Encoding: binary Content-Type: text/plain Date: Wed, 27 Oct 2010 16:48:07 -0400 MIME-Version: 1.0 Sent to CCL by: "N. Sukumar" [nagams|,|rpi.edu] > Blind QSAR based on large numbers of descriptors just selected by > sophisticated statistical methods will lead to QSAR equations, which > look significant, but most often include no physics. They will fail as > soon as you apply them to a novel situation. Of course, the fault here lies in the inadequate or improper use of robust validation techniques. While I agree in principle with Andreas that a descriptor should have a reasonable relation to the target property, many descriptors available today may not have an intuitively OBVIOUS correlation with the property (or target biological activity) of interest. While sophisticated statistical methods are not required to construct the most obvious correlations, excessive reliance on picking so-called "interpretable" descriptors "by hand" merely serves to reinforce one's existing prejudices ("chemical intuition") and rarely leads to the discovery of new science or new materials. Anyone wishing to seriously embark upon a program of PREDICTIVE cheminformatics should, at the very least, read the following articles by Alex Tropsha: A. Golbraikh, A. Tropsha, “Beware of q2 !”, J. Mol. Graph. Model. 20, 269–276 (2002). Alexander Tropsha, “Best Practices for QSAR Model Development, Validation, and Exploitation”, Molecular Informatics 29 (6-7), 476–488 (2010). Dr. N. Sukumar Rensselaer Exploratory Center for Cheminformatics Research http://reccr.chem.rpi.edu/ -------------------------- "It is nice to know that the computer understands the problem. But I would like to understand it too." -- Eugene P. Wigner ==============Original message text=============== On Tue, 26 Oct 2010 16:26:51 EDT "Andreas Klamt klamt~~cosmologic.de" wrote: Sent to CCL by: Andreas Klamt [klamt~~cosmologic.de] Dear George, I like to send a kind of warning: The large number of molecular descriptors which nowadays are easily made available by some programs also provide a kind of danger. If you have thousands of descriptors available for a property for which you may have lets say 50 exp. data, then the chance that some of them correlate just accidentally is quite high. If they correlate accidentally, no statistical method will detect that the correlation is accidental. Therefore I strongly recommend that you first decide rationally whether a descriptor may have any reasonable relation to the target property. There are few criteria wich can be used: If you want to describe a local property of a molecule, maybe a certain reactivity of a functional group, do not use global molecular descriptors, because they cannot be the right descriptors. Vice versa, do not use local descriptors for global properties (e.g. a logP). Do not use orbital descriptors when you want to describe molecular mobility/viscosity, diffusion coefficients, ..) Best use a small set of descriptors which is known to include the relevant information, e.g. for any kind of log-partition coefficient you may either use the 5 Abraham descriptors or the 5 COSMO-RS sigma-moments. ... Blind QSAR based on large numbers of descriptors just selected by sophisticated statistical methods will lead to QSAR equations, which look significant, but most often include no physics. They will fail as soon as you apply them to a novel situation. Best regards Andreas Am 26.10.2010 20:23, schrieb Erik-Jan Ras Erik-Jan.Ras..avantium.com: > Sent to CCL by: Erik-Jan Ras [Erik-Jan.Ras~~avantium.com] > Dear George, > > As already indicated by others, there is no uniform selection method for choosing which descriptors to use. Some guidelines, depending on the modeling method you use may still be helpfull. > > If you're using PLS models, a good starting point is the variable importance (VIP) for each of the variables in your model. A variable with a high VIP will have a high impact on your model performance. Typically you start your modeling exercise with all available variables. After that, in small iterative steps, you reduce your model. At each stage you have to carefully evaluate predictive power of your model. Ideally you would use asubstantially large external validation set to assess predictive power. > > Also keep in mind the fact that per response (Y) in theory only one latent variable should be required in your model. If (many) more latent variables are required you're dealing with variations in your descriptor space (X) that are orthogonal (uncorrelated) to your response (Y). In this case you may want to consider using OPLS in stead of PLS. > > Generally speaking, these methods are implemented in commercial packages like Simca-P and work quite well (also pretty well documented and referenced). With a bit more effort in environments like Matlab, Scilab or R many open source libraries are available as well. > > Regards, > Erik-Jan > > > ________________________________________ >> From: owner-chemistry+erikjan.ras==avantium.com_-_ccl.net [owner-chemistry+erikjan.ras==avantium.com_-_ccl.net] On Behalf Of George Lawrence geoe2##hotmail.com [owner-chemistry_-_ccl.net] > Sent: Tuesday, October 26, 2010 12:51 PM > To: Erik-Jan Ras > Subject: CCL: Descriptors > > Sent to CCL by: "George Lawrence" [geoe2%hotmail.com] > While building a model for a set of compounds, how does one make the choice of molecular descriptors, I am using MOE which has about 333 different descriptors. I noticed that some have the same suffix or prefix. > For example: GCUT (could be SlogP, SMR or PEOE) and then there is SlogP_ vsa, SMR_ vsa, PEOE_vsa which have different numbers attach to them. What does this mean? > > Do they describe the same thing? How does the numbers relate to each descriptor? > What are the best methods to use to decide the right choice of descriptors? > > George Lawrence > Geoe2[a]hotmail.com > Kent U.K.http://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txtThis email (including its attached files and other content) is confidential and intended only for the use by named addressee. Unauthorized use, dissemination, disclosure and/or copying are prohibited. This email, attachments and (any part of) its content are (1) intended for the named addressee(s) only, and (2) strictly confidential and proprietary. All rights are reserved byAvantium Holding B.V. and its subsidiaries ('Avantium'). Any unauthorized use, dissemination, disclosure and/or copying is strictly prohibited, except after prior and express written permission by Avantium. Avantium isnot responsible for the correct transmission and timely receipt of this email and its content. Should you have received this email, attachments and its content by mistake, please bring this to our attention and destroythis email in full. Thank you. http://www! > .avantium.com/about/legal-disclaimer/> > > -- PD. Dr. Andreas Klamt CEO / GeschäftsfĂĽhrer COSMOlogic GmbH& Co. KG Burscheider Strasse 515 D-51381 Leverkusen, Germany phone +49-2171-731681 fax +49-2171-731689 e-mail klamt---cosmologic.de web www.cosmologic.de HRA 20653 Amtsgericht Koeln, GF: Dr. Andreas Klamt Komplementaer: COSMOlogic Verwaltungs GmbH HRB 49501 Amtsgericht Koeln, GF: Dr. Andreas Klamthttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt===========End of original message text=========== From owner-chemistry@ccl.net Wed Oct 27 18:59:00 2010 From: "Stephen Bowlus chezbowlus-x-comcast.net" To: CCL Subject: CCL: Descriptors Message-Id: <-43022-101027183736-30700-XGX7CyeWdLPlWMg3nieQsA]*[server.ccl.net> X-Original-From: Stephen Bowlus Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Date: Wed, 27 Oct 2010 15:37:24 -0700 Mime-Version: 1.0 (Apple Message framework v936) Sent to CCL by: Stephen Bowlus [chezbowlus^-^comcast.net] It would perhaps be useful to recall that QSAR has its roots in the Hansch approach which itself is justified rather strictly by the Hammett equation of classical physical chemistry yore. Absolutely for the Hammett equation, and to a large extent in the Hansch approach, descriptors are selected which are expected to model specific (bio)physical processes. The descriptors may be amplified (higher order or cross-terms) or eliminated on statistical grounds; but in the mechanistic universe of the '60s and '70s, use of a specific descriptor implied an (at least) hypothetical, causative relationship with the target variable's response surface. I emphasize, "causative relationship." As we developed the ability to generate descriptors, the two-fold necessity to a) reduce descriptor correlation, and b) reduce model dimensionality led to such methods as PLS and PCR, which are applied rather indiscriminately today. My earliest experience with these methods (other investigators will have other experiences and perspectives on this head) was through CoMFA. PLS is an appropriate descriptor selection method here, because all CoMFA descriptors putatively model equivalent physical process(es), which have (as a group) an established, causative relationship with (e.g.) drug action. I have used other, mathematical/statistical/automatic (i.e. mindless) selection algorithms - neural nets, GAs ... - with fairly large arrays of descriptors. In all cases where I have arrived at a physically interpretable model, I have had to preselect classes of descriptors based on my understanding of the chemistry or biology I am trying to simulate. I emphasize, "physically interpretable." If your aim is to develop a model that will winnow a database of tens of millions (or more!) of compounds of no particular provenance to those that might somehow for whatever reason interact in any imaginable way with your receptor, a purely statistical approach to selection involving any reasonable (!) descriptor may be of use. Not intellectually satisfying, but of use. If your aim is to tie the model to specific molecular or receptor features or properties (as should be the case in the optimization stage of drug development), you should probably take the time to formulate some hypotheses of the types of processes likely to be important; which of these are likely to be limiting processes; and which types of descriptors model those types of processes and properties (e.g. molecular or bulk ... what Prof. Klamt said). -sb On Oct 27, 2010, at 9:12 AM, Isaac B Bersuker bersuker=-=cm.utexas.edu wrote: > > Sent to CCL by: Isaac B Bersuker [bersuker~~cm.utexas.edu] > I should like to support the statement below by Andreas Klamt and > call your attention to, e.g., my paper entitled "QSAR without > arbitrary descriptors..." in J. Comput. Aided Mol. Des. (2008) 22:423. > > Dr. Isaac B. Bersuker > Institute for Theoretical Chemistry > The University of Texas at Austin > Chem & Biochem Department > 1 University Station A5300 > Austin, TX 78712-0165 > Phone: (512) 471-4671; Fax: (512) 471-8696 > E-mail: bersuker ~ cm.utexas.edu > http://www.cm.utexas.edu/isaac_bersuker > > ----- Original Message ----- >> From: "Andreas Klamt klamt~~cosmologic.de" > ccl.net> > To: "Isaac B. Bersuker" > Sent: Tuesday, October 26, 2010 3:26:51 PM > Subject: CCL: Descriptors > > > Sent to CCL by: Andreas Klamt [klamt~~cosmologic.de] > Dear George, > > I like to send a kind of warning: The large number of molecular > descriptors which nowadays are easily made available by some programs > also provide a kind of danger. If you have thousands of descriptors > available for a property for which you may have lets say 50 exp. data, > then the chance that some of them correlate just accidentally is quite > high. If they correlate accidentally, no statistical method will > detect > that the correlation is accidental. Therefore I strongly recommend > that > you first decide rationally whether a descriptor may have any > reasonable > relation to the target property. There are few criteria wich can be > used: If you want to describe a local property of a molecule, maybe a > certain reactivity of a functional group, do not use global molecular > descriptors, because they cannot be the right descriptors. Vice versa, > do not use local descriptors for global properties (e.g. a logP). Do > not > use orbital descriptors when you want to describe molecular > mobility/viscosity, diffusion coefficients, ..) Best use a small set > of > descriptors which is known to include the relevant information, e.g. > for > any kind of log-partition coefficient you may either use the 5 Abraham > descriptors or the 5 COSMO-RS sigma-moments. ... > > Blind QSAR based on large numbers of descriptors just selected by > sophisticated statistical methods will lead to QSAR equations, which > look significant, but most often include no physics. They will fail as > soon as you apply them to a novel situation. > > Best regards > > Andreas > > Am 26.10.2010 20:23, schrieb Erik-Jan Ras Erik-Jan.Ras..avantium.com: >> Sent to CCL by: Erik-Jan Ras [Erik-Jan.Ras~~avantium.com] >> Dear George, >> >> As already indicated by others, there is no uniform selection >> method for choosing which descriptors to use. Some guidelines, >> depending on the modeling method you use may still be helpfull. >> >> If you're using PLS models, a good starting point is the variable >> importance (VIP) for each of the variables in your model. A >> variable with a high VIP will have a high impact on your model >> performance. Typically you start your modeling exercise with all >> available variables. After that, in small iterative steps, you >> reduce your model. At each stage you have to carefully evaluate >> predictive power of your model. Ideally you would use >> asubstantially large external validation set to assess predictive >> power. >> >> Also keep in mind the fact that per response (Y) in theory only one >> latent variable should be required in your model. If (many) more >> latent variables are required you're dealing with variations in >> your descriptor space (X) that are orthogonal (uncorrelated) to >> your response (Y). In this case you may want to consider using OPLS >> in stead of PLS. >> >> Generally speaking, these methods are implemented in commercial >> packages like Simca-P and work quite well (also pretty well >> documented and referenced). With a bit more effort in environments >> like Matlab, Scilab or R many open source libraries are available >> as well. >> >> Regards, >> Erik-Jan >> >> >> ________________________________________ >>> From: owner-chemistry+erikjan.ras==avantium.com_-_ccl.net [owner- >>> chemistry+erikjan.ras==avantium.com_-_ccl.net] On Behalf Of George >>> Lawrence geoe2##hotmail.com [owner-chemistry_-_ccl.net] >> Sent: Tuesday, October 26, 2010 12:51 PM >> To: Erik-Jan Ras >> Subject: CCL: Descriptors >> >> Sent to CCL by: "George Lawrence" [geoe2%hotmail.com] >> While building a model for a set of compounds, how does one make >> the choice of molecular descriptors, I am using MOE which has about >> 333 different descriptors. I noticed that some have the same suffix >> or prefix. >> For example: GCUT (could be SlogP, SMR or PEOE) and then there is >> SlogP_ vsa, SMR_ vsa, PEOE_vsa which have different numbers attach >> to them. What does this mean? >> >> Do they describe the same thing? How does the numbers relate to >> each descriptor? >> What are the best methods to use to decide the right choice of >> descriptors? >> >> George Lawrence >> Geoe2[a]hotmail.com >> Kent U.K.http://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txtThis >> email (including its attached files and other content) is >> confidential and intended only for the use by named addressee. >> Unauthorized use, dissemination, disclosure and/or copying are >> prohibited. This email, attachments and (any part of) its content >> are (1) intended for the named addressee(s) only, and (2) strictly >> confidential and proprietary. All rights are reserved byAvantium >> Holding B.V. and its subsidiaries ('Avantium'). Any unauthorized >> use, dissemination, disclosure and/or copying is strictly >> prohibited, except after prior and express written permission by >> Avantium. Avantium isnot responsible for the correct transmission >> and timely receipt of this email and its content. Should you have >> received this email, attachments and its content by mistake, please >> bring this to our attention and destroythis email in full. Thank >> you. http://www! > >> .avantium.com/about/legal-disclaimer/> >> >> > > > -- > PD. Dr. Andreas Klamt > CEO / Geschäftsführer > COSMOlogic GmbH& Co. KG > Burscheider Strasse 515 > D-51381 Leverkusen, Germany > > phone +49-2171-731681 > fax +49-2171-731689 > e-mail klamt---cosmologic.de > web www.cosmologic.de > > HRA 20653 Amtsgericht Koeln, GF: Dr. Andreas Klamt > Komplementaer: COSMOlogic Verwaltungs GmbH > HRB 49501 Amtsgericht Koeln, GF: Dr. Andreas Klamthttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt > > > -= This is automatically added to each message by the mailing script > =- > To recover the email address of the author of the message, please > change> Conferences: http://server.ccl.net/chemistry/announcements/ > conferences/> > From owner-chemistry@ccl.net Wed Oct 27 19:43:00 2010 From: "zouzou adnani zinebeladnani~~hotmail.com" To: CCL Subject: CCL:G: molecular area in G03 Message-Id: <-43023-101027194138-18984-RwpEagErFwFbICpQH7r4ng+/-server.ccl.net> X-Original-From: "zouzou adnani" Date: Wed, 27 Oct 2010 19:41:36 -0400 Sent to CCL by: "zouzou adnani" [zinebeladnani#hotmail.com] hi everyone; I want to know if there is a keyword in G03 that will allow me to compute the molecular area, like the "volume" keyword for the molar volume; regards Zineb EL ADNANI