From chemistry-request@ccl.net Thu Apr 14 02:07:42 2005
Received: from yangtze.hku.hk (yangtze.hku.hk [147.8.148.244])
	by server.ccl.net (8.13.1/8.13.1) with ESMTP id j3E67bsE008932
	for <chemistry !=! ccl.net>; Thu, 14 Apr 2005 02:07:38 -0400
Received: from yangtze.hku.hk (localhost.localdomain [127.0.0.1])
	by yangtze.hku.hk (8.13.1/8.13.1) with ESMTP id j3E4mruO003818
	for <chemistry !=! ccl.net>; Thu, 14 Apr 2005 12:48:53 +0800
Received: from localhost (lhhu@localhost)
	by yangtze.hku.hk (8.13.1/8.13.1/Submit) with ESMTP id j3E4mqNp003815
	for <chemistry !=! ccl.net>; Thu, 14 Apr 2005 12:48:53 +0800
Date: Thu, 14 Apr 2005 12:48:52 +0800 (HKT)
From: lhhu !=! yangtze.hku.hk
To: chemistry !=! ccl.net
Subject: Correlation Coefficient
Message-ID: <Pine.LNX.4.61.0504141219570.2286 !=! yangtze.hku.hk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Spam-Status: No, score=0.2 required=5.0 tests=NO_REAL_NAME autolearn=no 
	version=3.0.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net


Dear All,

I have a question about Correlation Coefficient of cross-validation. Can 
anyone tell me how much the value of Correlation Coefficient indicate 
the result is reliable? Certainly I know the larger the value, the more 
reliable the result. I want to know the lowest limit of that value. I read 
some papers that used the Correlation Coefficient value, some values are 
lower than 0.8, is it OK for present them in journals? Or it depends on 
different situation or topic?

Thanks a lot to you all,

Holly





From chemistry-request@ccl.net Thu Apr 14 17:59:42 2005
Received: from f04n07.cac.psu.edu (f04s07.cac.psu.edu [128.118.141.35])
	by server.ccl.net (8.13.1/8.13.1) with ESMTP id j3ELxY7E005105
	for <chemistry () ccl.net>; Thu, 14 Apr 2005 17:59:34 -0400
Received: from blue.chem.psu.edu (blue.chem.psu.edu [128.118.56.38])
	by f04n07.cac.psu.edu (8.13.2/8.13.2) with ESMTP id j3ELxOX4028980;
	Thu, 14 Apr 2005 17:59:24 -0400
Subject: Re: CCL:Correlation Coefficient
From: Rajarshi Guha <rxg218 () psu.edu>
Reply-To: rxg218 () psu.edu
To: lhhu () yangtze.hku.hk, chemistry () ccl.net
In-Reply-To: <Pine.LNX.4.61.0504141219570.2286 () yangtze.hku.hk>
References: <Pine.LNX.4.61.0504141219570.2286 () yangtze.hku.hk>
Content-Type: text/plain
Date: Thu, 14 Apr 2005 17:59:24 -0400
Message-Id: <1113515964.20124.5.camel () blue.chem.psu.edu>
Mime-Version: 1.0
X-Mailer: Evolution 2.0.2 (2.0.2-3) 
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by amavisd-new
X-Spam-Status: No, score=0.5 required=5.0 tests=FROM_ENDS_IN_NUMS 
	autolearn=no version=3.0.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net

On Thu, 2005-04-14 at 12:48 +0800, lhhu () yangtze.hku.hk wrote:
> Dear All,
> 
> I have a question about Correlation Coefficient of cross-validation. Can 
> anyone tell me how much the value of Correlation Coefficient indicate 
> the result is reliable? Certainly I know the larger the value, the more 
> reliable the result. I want to know the lowest limit of that value. I read 
> some papers that used the Correlation Coefficient value, some values are 
> lower than 0.8, is it OK for present them in journals? Or it depends on 
> different situation or topic?

There are differing views on the validity of q^2, in that high values of
q^2 do not necessarily indicate a good model in terms of predictive
ability. Some papers that discuss this topic are

@ARTICLE{q2-1,
    author = {Golbraikh, A. and Tropsha, A.},
    title = {Beware of $q^2$},
    journal = {J.~Mol.~Graph.~Model.},
    year = {2002},
    volume = {20},
    pages = {269--276},
}

@ARTICLE{q2-5,
    author = {Norinder, U.},
    title = {Single and domain made variable selection in {3D QSAR}
        applications},
    journal = {J.~Chemomet.},
    year = {1996},
    volume = {10},
    pages = {95--105},
}

@ARTICLE{q2-2,
    author = {Golbraikh, A. and Shen, M. and Xiao, Z.Y. and Xiao, Y.D.
        and Lee, K.H. and Tropsha, A.},
    title = {Rational selection of training and test sets for the
        development of validated {QSAR} models},
    journal = {J.~Comput.~Aid.~Mol.~Des.},
    year = {2003},
    volume = {17},
    pages = {241--253},
}

HTH

-------------------------------------------------------------------
Rajarshi Guha <rxg218 () psu.edu> <http://jijo.cjb.net>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
I'm related to people I don't relate to."
-Calvin



From chemistry-request@ccl.net Thu Apr 14 14:55:30 2005
Received: from mx5.informatik.uni-tuebingen.de (mx5.Informatik.Uni-Tuebingen.De [134.2.12.32])
	by server.ccl.net (8.13.1/8.13.1) with ESMTP id j3EItOVK027924
	for <chemistry^at^ccl.net>; Thu, 14 Apr 2005 14:55:24 -0400
Received: from localhost (loopback [127.0.0.1])
	by mx5.informatik.uni-tuebingen.de (Postfix) with ESMTP
	id B4D2610E; Thu, 14 Apr 2005 18:57:57 +0200 (DFT)
Received: from mx3.informatik.uni-tuebingen.de ([134.2.12.26])
 by localhost (mx5 [134.2.12.32]) (amavisd-new, port 10024) with ESMTP
 id 42468-02; Thu, 14 Apr 2005 18:57:55 +0200 (DFT)
Received: from [134.2.10.184] (rapc84.informatik.uni-tuebingen.de [134.2.10.184])
	by mx3.informatik.uni-tuebingen.de (Postfix) with ESMTP
	id 3C870149; Thu, 14 Apr 2005 18:57:53 +0200 (DFT)
Message-ID: <425EA110.1000207^at^informatik.uni-tuebingen.de>
Date: Thu, 14 Apr 2005 18:57:52 +0200
From: "Joerg K. Wegner" <wegnerj^at^informatik.uni-tuebingen.de>
Organization: Department of Computer Architecture
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: lhhu^at^yangtze.hku.hk
Cc: chemistry^at^ccl.net
Subject: Re: CCL:Correlation Coefficient
References: <Pine.LNX.4.61.0504141219570.2286^at^yangtze.hku.hk>
In-Reply-To: <Pine.LNX.4.61.0504141219570.2286^at^yangtze.hku.hk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: by amavisd-new (McAfee AntiVirus) at informatik.uni-tuebingen.de
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed 
	version=3.0.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net

Hi Holly,

I would say there is no general answer and values of 0.8 or whatever 
seems suspect to me. Why? The quality depends highly on the data set 
used, even on data set with drug molecules [1].
Furthermore, if you apply an optimization algorithm to select a better 
set of features/descriptors, better induction parameters, etc. you can 
easily produce an overfitted result [2] (see references therein).
So, there is no short answer, but yes you can use the cross-validation 
result (of the outer loop) to estimate the future performance of your 
hypothesis. In other words, if you have an infinite amount of data the 
true error is the same as the experimental error. In all other cases 
with a finite number of training molecules it is not.
I would recommend to compare your result to other method based 
approaches, which is the most serious way to compare prediction rates. 
But as already said in [2] you should not use leave-one-out 
cross-validation for comparing feature selection methods, also not 
emprical selected feature sets.

[1] Wolpert, D. H. and Macready, W. G. No Free Lunch Theorems for 
Optimization IEEE Transactions on Evolutionary Computation, 1997 , 1 , 67-82
[2] Wegner, J. K. and Frvhlich, H. and Zell, A. Feature selection for 
Descriptor based Classification Models. 2. Human Intestinal Absorption 
(HIA) J. Chem. Inf. Comput. Sci., 2004 , 44 , 931-939

Kind regards, Joerg

> I have a question about Correlation Coefficient of cross-validation. Can 
> anyone tell me how much the value of Correlation Coefficient indicate 
> the result is reliable? Certainly I know the larger the value, the more 
> reliable the result. I want to know the lowest limit of that value. I 
> read some papers that used the Correlation Coefficient value, some 
> values are lower than 0.8, is it OK for present them in journals? Or it 
> depends on different situation or topic?
> 
> Thanks a lot to you all,
> 
> Holly
> 
> 
> 
> 
> 
> -= This is automatically added to each message by the mailing script =-
> To send e-mail to subscribers of CCL put the string CCL: on your 
> Subject: line
> and send your message to:  CHEMISTRY^at^ccl.net
> 
> Send your subscription/unsubscription requests to: 
> CHEMISTRY-REQUEST^at^ccl.net HOME Page: http://www.ccl.net   | Jobs Page: 
> http://www.ccl.net/jobs
> If your is mail bouncing from ccl.net domain due to spam filters, please
> use the Web based form from CCL Home Page 
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> 
> 
> 
> 
> 
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:wegnerj^at^informatik.uni-tuebingen.de
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)



From chemistry-request@ccl.net Thu Apr 14 17:10:10 2005
Received: from web31512.mail.mud.yahoo.com (web31512.mail.mud.yahoo.com [68.142.198.141])
	by server.ccl.net (8.13.1/8.13.1) with SMTP id j3ELA498002709
	for <chemistry^at^ccl.net>; Thu, 14 Apr 2005 17:10:05 -0400
Received: (qmail 7649 invoked by uid 60001); 14 Apr 2005 20:10:03 -0000
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  b=4Lr15OCWZNME3JZT5KjVxn8XcWgR+VStKzNaF43pKWnO49+ZnWODtKiJVHK5IZZPGAt689oJhmcFnJjaJSmCrlf4eCojqlmuyHw3v8rQLZmQc+uE7OVekGYAy7c1Qn32mEEH1BOYFPKN/Of7PNT1SFzGeBPG85FqqPgKXMFpPJk=  ;
Message-ID: <20050414201003.7647.qmail^at^web31512.mail.mud.yahoo.com>
Received: from [165.106.21.208] by web31512.mail.mud.yahoo.com via HTTP; Thu, 14 Apr 2005 13:10:03 PDT
Date: Thu, 14 Apr 2005 13:10:03 -0700 (PDT)
From: Vincent Xianlong Wang <xloongw^at^yahoo.com>
Subject: CCL:PBC calculation
To: chemistry^at^ccl.net
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Spam-Status: No, score=2.5 required=5.0 tests=DNS_FROM_RFC_ABUSE,
	FORGED_YAHOO_RCVD autolearn=no version=3.0.1
X-Spam-Level: **
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net

Dear CCLers,

I'm going to look at the conformation problem in
molecular crystal, like t-butyl benzene.I want to get
some advices from experts before plunging into
calculations.
1) What are pros and cons related with PBC calculation
compared with supermolecule (or cluster) model? What
kind of cautions should I bear in calculations or in
interpretation of the results?
2) What is the accuarcy for different theoretical
level models? How about the computational limits in
Gaussian?

Thank you in advance for your suggestions. I would 
appreciate if you could provide me some reference
work.

Best,

Vincent


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 


From chemistry-request@ccl.net Thu Apr 14 13:52:12 2005
Received: from web41903.mail.yahoo.com (web41903.mail.yahoo.com [66.218.93.154])
	by server.ccl.net (8.13.1/8.13.1) with SMTP id j3EHq8f0022715
	for <chemistry^at^ccl.net>; Thu, 14 Apr 2005 13:52:09 -0400
Received: (qmail 56019 invoked by uid 60001); 14 Apr 2005 16:52:08 -0000
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  b=K0NgHqwL1ELrbUWAxfmHpHsz7G8G1EUE6nynPHPMEvFgChcVQG8r3BhMiMCBgjenCQapdHI0IFHbcz/bnToJJPdwb+glmhbk/IT8qQ+clzpmxx/HcuVSNf0ebghDB4yv9QxBizZzBKN3eLyzAy08Si8Z1Lz242z5WOwaXrdoxB4=  ;
Message-ID: <20050414165208.56017.qmail^at^web41903.mail.yahoo.com>
Received: from [128.139.226.37] by web41903.mail.yahoo.com via HTTP; Thu, 14 Apr 2005 09:52:08 PDT
Date: Thu, 14 Apr 2005 09:52:08 -0700 (PDT)
From: limor harel <limorharel^at^yahoo.com>
Subject: ozone
To: ccl <chemistry^at^ccl.net>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="0-1046274140-1113497528=:55575"
X-Spam-Status: No, score=0.5 required=5.0 tests=DNS_FROM_RFC_ABUSE,HTML_40_50,
	HTML_MESSAGE autolearn=no version=3.0.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net

--0-1046274140-1113497528=:55575
Content-Type: text/plain; charset=us-ascii

Hello , 
 
I would like to get some information about the ozone's molecular mechanism in the atmosphere and the spectroscopy of ozone.
 
Thanks,
 
Limor

		
---------------------------------
Do you Yahoo!?
 Yahoo! Small Business - Try our new resources site! 
--0-1046274140-1113497528=:55575
Content-Type: text/html; charset=us-ascii

<DIV>Hello , </DIV>
<DIV>&nbsp;</DIV>
<DIV>I would like to get some information about the ozone's molecular mechanism in the atmosphere and the spectroscopy of ozone.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Thanks,</DIV>
<DIV>&nbsp;</DIV>
<DIV>Limor</DIV><p>
		<hr size=1>Do you Yahoo!?<br> 
Yahoo! Small Business - <a href="http://us.rd.yahoo.com/evt=31637/*http://smallbusiness.yahoo.com/resources/">Try our new resources site!</a> 
--0-1046274140-1113497528=:55575--


From chemistry-request@ccl.net Thu Apr 14 12:51:21 2005
Received: from atom.chem.iupui.edu (atom.chem.iupui.edu [134.68.137.125])
	by server.ccl.net (8.13.1/8.13.1) with ESMTP id j3EGpHCg018229
	for <chemistry)at(ccl.net>; Thu, 14 Apr 2005 12:51:18 -0400
Received: from [134.68.137.220] (in-137-220.dhcp-134-68.iupui.edu [134.68.137.220])
	by atom.chem.iupui.edu (Postfix) with ESMTP id 2A435120B7E0
	for <chemistry)at(ccl.net>; Thu, 14 Apr 2005 09:53:47 -0500 (CDT)
Mime-Version: 1.0 (Apple Message framework v619)
In-Reply-To: <Pine.LNX.4.61.0504141219570.2286)at(yangtze.hku.hk>
References: <Pine.LNX.4.61.0504141219570.2286)at(yangtze.hku.hk>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <AC57D906-ACF4-11D9-9AE0-003065F96212)at(chem.iupui.edu>
Content-Transfer-Encoding: 7bit
From: Kelsey Forsythe <forsythe)at(chem.iupui.edu>
Subject: Re: CCL:Correlation Coefficient
Date: Thu, 14 Apr 2005 09:51:24 -0500
To: chemistry)at(ccl.net
X-Mailer: Apple Mail (2.619)
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed 
	version=3.0.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net

I would not rely solely on a single statistical measure.  There are  
others such as the SDEP or standard error of prediction.  One can also  
apply bootstrapping to look for bias and/or scrambling to determine if  
the result is due to chance or not.

KF
On Apr 13, 2005, at 11:48 PM, lhhu)at(yangtze.hku.hk wrote:

>
> Dear All,
>
> I have a question about Correlation Coefficient of cross-validation.  
> Can anyone tell me how much the value of Correlation Coefficient  
> indicate the result is reliable? Certainly I know the larger the  
> value, the more reliable the result. I want to know the lowest limit  
> of that value. I read some papers that used the Correlation  
> Coefficient value, some values are lower than 0.8, is it OK for  
> present them in journals? Or it depends on different situation or  
> topic?
>
> Thanks a lot to you all,
>
> Holly
>
>
>
>
>
> -= This is automatically added to each message by the mailing script =-
> To send e-mail to subscribers of CCL put the string CCL: on your  
> Subject: line
> and send your message to:  CHEMISTRY)at(ccl.net
>
> Send your subscription/unsubscription requests to:  
> CHEMISTRY-REQUEST)at(ccl.net HOME Page: http://www.ccl.net   | Jobs Page:  
> http://www.ccl.net/jobs
> If your is mail bouncing from ccl.net domain due to spam filters,  
> please
> use the Web based form from CCL Home Page  
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 
> +-+
>
>
>
>
>
>
Kelsey Forsythe, PhD
Director, Computational Molecular Science Facility
IUPUI Chemistry
LD 320
402 North Blackford St.
Indianapolis, IN 46202
Ph: 317-278-2202
Fax: 317-274-4701



From chemistry-request@ccl.net Thu Apr 14 12:26:40 2005
Received: from smtp.goldrush.com (smtp.goldrush.com [206.171.171.11])
	by server.ccl.net (8.13.1/8.13.1) with ESMTP id j3EGQZZF016535
	for <chemistry)at(ccl.net>; Thu, 14 Apr 2005 12:26:35 -0400
Received: from Compaq (x2-04-151.goldrush.com [64.162.10.151])
	by smtp.goldrush.com (8.12.8/8.12.8) with SMTP id j3EGQJC1023826;
	Thu, 14 Apr 2005 09:26:27 -0700
From: "Steve Bowlus" <chezbowlus)at(goldrush.com>
To: <lhhu)at(yangtze.hku.hk>, <chemistry)at(ccl.net>
Subject: RE: Correlation Coefficient
Date: Thu, 14 Apr 2005 09:26:20 -0700
Message-ID: <OLEDKJNBCEKJDJFPILMJIEHKCAAA.chezbowlus)at(goldrush.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="Windows-1252"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
In-Reply-To: <Pine.LNX.4.61.0504141219570.2286)at(yangtze.hku.hk>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
X-MailScanner: Found to be clean
X-MailScanner-SpamCheck: 
X-MailScanner-From: chezbowlus)at(goldrush.com
X-Spam-Status: No, score=0.1 required=5.0 tests=DNS_FROM_AHBL_RHSBL 
	autolearn=failed version=3.0.1
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on server.ccl.net

Cross-validated r-sq (or q-sq) provides a measure of the homogeneity of the
data set.  Depending on how the model is developed, it reveals whether there
are influential points in the data set which are unduly weighted in the
model.  In this regard, q-sq replaces the influence statistics common in
univariate statistics.

By "reliable" are you actually meaning to ask whether predictions made by
the model will be accurate (i.e. useful)?  One must remember that 1)
predictions are associated with a confidence interval, and 2) the new
compound must lie in the prediction space of the model (not be an outlier
wrt any of the descriptors).  So prediction made with a large CI may be
statistically correct, but practically useless ("unreliable"?).  Bottom
line, the model should have high r-sq, high cross-validated r-sq, and a
reasonably tight standard error of prediction.

What is useful is of course dependent on the situation.  Small, noisy data
sets at the beginning of a project, where the intent of the model may be to
determine areas or descriptors for exploration, might use r-sq on the order
of 0.7 and q-sq as low as 0.3.  As the dataset grows and becomes more
homogeneous in the descriptor space, r-sq and q-sq should both increase and
converge on the same value, while the SE shrinks to a (practical) limit of
the accuracy of the (bio)assay.

One or more journal editors may weigh in with their favorite cutoffs, but I
am not aware there is any magic number to assure the "reliability" of a
model.

Steve Bowlus




-----Original Message-----
From: Computational Chemistry List [mailto:chemistry-request)at(ccl.net]On
Behalf Of lhhu)at(yangtze.hku.hk
Sent: Wednesday, April 13, 2005 9:49 PM
To: chemistry)at(ccl.net
Subject: CCL:Correlation Coefficient



Dear All,

I have a question about Correlation Coefficient of cross-validation. Can
anyone tell me how much the value of Correlation Coefficient indicate
the result is reliable? Certainly I know the larger the value, the more
reliable the result. I want to know the lowest limit of that value. I read
some papers that used the Correlation Coefficient value, some values are
lower than 0.8, is it OK for present them in journals? Or it depends on
different situation or topic?

Thanks a lot to you all,

Holly





-= This is automatically added to each message by the mailing script =-
To send e-mail to subscribers of CCL put the string CCL: on your Subject:
line
and send your message to:  CHEMISTRY)at(ccl.net

Send your subscription/unsubscription requests to: CHEMISTRY-REQUEST)at(ccl.net
HOME Page: http://www.ccl.net   | Jobs Page: http://www.ccl.net/jobs

If your is mail bouncing from ccl.net domain due to spam filters, please
use the Web based form from CCL Home Page
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+










