CCL: Quality Control



 Sent to CCL by: [Dave.Winkler[*]csiro.au]
 That is an interesting question Simon.  This issue may not be as important as
 you think unless you want the model to be interpretable in physicochemical
 terms.  For example, measured log octanol/water partition coefficient is
 commonly used as a descriptor.  However, it is really just a surrogate for the
 lipophilic properties of molecules, presumably telling you about their ability
 to cross biological membranes and to bind to proteins where the interaction is
 largely lipophilic (e.g. nuclear receptors). Measured logP can also be predicted
 by rule-based or descriptors-based QSAR models so in essence you are
 substituting another set of descriptors for the measured logP values.  These
 descriptors in turn could be estimated by other descriptors.  The bottom line is
 that some relatively obscure descriptors like autocorrelation functions,
 molecular fields, molecular eigenvalue descriptors can be useful for generating
 models even when their connections to the physical interactions is too complex
 to pick apart.  However simpler, interpretable descriptors are always preferred
 provided they generate a strong model, and one must always be aware of
 generating chance correlations, overfitted models,  correlations without
 causation.
 Dave
 Prof. Dave Winkler
 Senior Principal Research Scientist
 Biomaterials & Regenerative Medicine
 CSIRO Materials Science and Engineering
 Clayton 3168, Australia
 ________________________________________
 > From: owner-chemistry+dave.winkler==csiro.au_-_ccl.net
 [owner-chemistry+dave.winkler==csiro.au_-_ccl.net] On Behalf Of Simon Harris
 sihar3000[A]hotmail.co.uk [owner-chemistry_-_ccl.net]
 Sent: Wednesday, 15 December 2010 2:48 AM
 To: Winkler, Dave (CMSE, Clayton)
 Subject: CCL: Quality Control
 Sent to CCL by: "Simon  Harris" [sihar3000]![hotmail.co.uk]
 Dear Subscribers,
 Please could you help me.
 I am working on QSAR and would like to know how a quality control studies is
 done for descriptors.
 I don't mean the validation of the dataset (cross-validation or external
 validation-no) but more the validation of the values of the descriptors obtained
 from the software used to calculate them.
 Is there a way to do this? Is there to justify values from your choosen software
 apart from recalculating the descriptors using another software?
 Thank you for your help in advance
 Simon Harris
 Sihar3000===hotmail.co.uk
 Brighton UKhttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp-:-//www.ccl.net/chemistry/sub_unsub.shtmlhttp-:-//www.ccl.net/spammers.txt