From chemistry-request.,at,.server.ccl.net Wed Feb 6 01:24:09 2002 Received: from web21305.mail.yahoo.com ([216.136.129.141]) by server.ccl.net (8.11.6/8.11.0) with SMTP id g166O8c01574 for ; Wed, 6 Feb 2002 01:24:09 -0500 Message-ID: <20020206062358.28267.qmail %-% at %-% web21305.mail.yahoo.com> Received: from [208.241.25.130] by web21305.mail.yahoo.com via HTTP; Tue, 05 Feb 2002 22:23:58 PST Date: Tue, 5 Feb 2002 22:23:58 -0800 (PST) From: Richard SMITH Subject: QSAR IC50 (SUMMARY and more questions) To: chemistry&$at$&ccl.net MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Dear CCLers: Thank you very much for your replies. As there were so many requests for the summary, I am sending the summary here. But this summary provokes me more questions. So I dont want to stop here. I would like to hear from you on the follwoing: Is there any rule of thumb on minimum, maximum, and ideal number of compounds required for a generating a statistically significant QSAR equation. Can we use all the compounds from the same cluster of compounds, (but the important features may not show up in the final equation, as they may be common to most of the compounds). Then how do we handle this? Or is it necessary that one should use a diverse set of compounds? I mean compounds from various clusters with varying IC50? I would like to hear from you more on this aspects. Enclosed please find the summary for my earlier query: --------------------- My query Dear Friends: I am trying to build QSAR using 17 compounds for which IC50 values are available. Do I need to use the IC50 as it is or should I take log(IC50) or -log(IC50). The values range from 10 to 100. I tried IC50 and log(IC50) using GFA. I am getting different QSAR equation. Could you please advice me on how to proceed. Thanks in advance. Smith ----------------------REPLIES------------- "C D'Silva" Mon, 4 Feb 2002 19:19:21 GMT Dear richard, you take the log (1/IC50) and plot it against your physicochemical parameter. Biological activity is always plotted on the y axis and the parameter on the x-axis (eg. Hammett or logP). If you get a linear plot the biological activity is dependent on one parameter. However it is more than often that the biological activity is dependent on two or more parameters If you get a parabolic plot then there will be a parameter that will be present to the power of 2: this is usually log p or TT. I enclose attachments to two papers on the topic. Best Wishes Claudius ------------------------------------ "quanph" Tue, 5 Feb 2002 09:28:18 +0700 Dear Richard SMITH, I 'm receive of you. Actually, I want to find the equation for you. I 'm studying QSArs, but I the same you in this field. your question "the reason why only logIC50 has to be used in the QSAR equation and not raw IC50 values", I think because logIC50 is linear than raw IC50 values and easy find QSAR equation. And the QSARS method classic only find QSAR equation > from multi regression. As I know ANN, Fuzzy Logic and GA in GFA solve non-linear equation, find good structure-activity relationships. You can to use logIC50 and raw IC50 values for QSARs. Phung Quan --------------------------- "Dave Young" Mon, 04 Feb 2002 09:51:13 -0500 Richard, It sounds like you have a difficult job. First of all, one order of magnitude isn't a very big spread by biochemical standards. Second, you only have a small number of compounds. Also, you didn't specify whether the IC50's were obtained from biochemical assays or cell culture assay, or whether kinetics indicate competitive inhibition. In any case, I would like to see you summarize the replies to your question on the CCl. Here are some things to consider. First of all, you expect to see docking energies proportional to Ki values from biochemical assays. This is a simple Ahhrenius relationship between energy and kinetics. Docking can show extremely reactive compounds to bind well. However, if the compound is so reactive that it binds to residues on the surface of the enzyme instead of just in the active site, then there may be no inhibitory effect in biochemical assays since so little of it was left to interact with the active site. 3D QSAR pharmacophore models are, to a first approximation, directly proportional to docking energies. Unfortunately, this isn't necessarily a rigorous relationship, since charge-charge interactions are much stronger than the interaction between hydrophobic groups. Also, hydrophobic, hydrophobic interactions are difficult to represent correctly, since they are important relative to the interaction between the hydrophobic group and a polar solvent, or cell cytoplasm in the case of biological systems. QSAR prediction of cell culture data becomes even more slippery. First of all, a compound that binds well in biohchemical assays may not be lipophillic enough to get through the cell wall if it must interact with an enzyme inside of the cell. Second, something that works in biochemical assays may be toxic to cells. It might even interact with some other enzyme, giving unpredictable results. The choice of target enzyme may be based on an incomplete knowledge of some biochemical assay, so inhibiting it may have unexpected side effects. In theory, a conventional QSAR equation can incorporate information > from docking, 3D QSAR, lipophillicity, and toxicity to predict cell culture results, assuming that some of the other things that could go wrong don't. You always prefer to work with a large amount of data, prefereably with activities spanning many orders of magnitude. Drug design work is a matter of very delicate balances. You want something that binds well in the active site, but isn't so reactive it doesn't bind elsewhere. You wan't it specific to one enzyme, but able to inhibit all serotypes. You want it lipophillic enough to be bioavailable, but if it is too lipophillic, the liver will remove it from the body too quickly. If kinetics don't show the competitive inhibition of the enzyme, then your problems increase by several orders of magnitude. I know you are probably already familiar with most of what I have written here, but it is good to stand back and examine all aspects of the problem before focusing in on the task of the day. There are a selection of commercial tools available to attack these problems, but there is always something you would like that the commercial software doesn't do. There are also some companies developing some interesting ideas for combating problems like bioavailability and resistance build-up. Unfortunately, I can't give details about the drug design tools my company has been developing in this forum, but you can find some public info at www.exegenicsinc.com Good luck with your project. ---------------------------------- Mon, 4 Feb 2002 08:37:31 -0200 (BDB) "antonio luiz oliveira de noronha" Hi About your QSAR question. In fact, as in the Dr. Kubiny book, it doesn't matter, you can use whatever unit you want. But You should be aware that, in log units, usually the IC50 behaves linearly, and if you are doing a linear regression is will work fine. If you don't take the log, don't expect the IC50 to behave linearly, so don't use a linear regression. You should use IC50's ranging 2 log units at least to have a good model. Also, assuming that your data behaves linearly, in log units or not, the equations will be different as they will reflect different values of IC50, and a different range of values, also. My advice is use -log, or, if you want use the straight data, use a program capable to recognize/deal with non linear data. Regards -------------------- Mon, 4 Feb 2002 09:52:54 +1100 Dave.Winkler %-% at %-% csiro.au Hi, Richard, You should generally use log IC50 if possible, as the linear free energy relationships on which QSAR is based relate the energetics of binding to the log of the equilibrium constant for binding ki (or something related to it such as IC50) delta G = -rt lnk -- Cheers, Dave ------------------------------------ 02 Feb 2002 12:14:21 PST "Alan Shusterman" log(IC50) and -log(IC50) are equivalent, right? You should get the same QSAR except the signs of the coefficients are reversed. As for IC50 vs. log(IC50), most people work with the latter (partly because it makes the data more linear). Of course, these will (should) give different QSAR. If logX gives a linear relationship another variable, then X by itself should give an exponential relationship. -Alan ------------------------------------- Sat, 2 Feb 2002 17:53:54 +0100 "Jeremy R. Greenwood" The way I see it (and I'm no expert) IC50 is a type of equilibium constant, and the log of IC50 is related to binding energy (deltaG=1.36*(logpKI) + c, in kcal/mol). Since the type of equations you are likely to build are linear, and related somehow to properties which are assumed additive and hopefully related to binding energies, you want logIC50. Hope this helps a little, Jeremy ---------------------------------------------------------------------- Jeremy Greenwood jeremy -A_T- greenwood.net Department of Medicinal Chemistry bh +45 35306117 Royal Danish School of Pharmacy fx +45 35306040 Universitetsparken 2, DK-2100 Copenhagen, Denmark ah +45 32598030 ---------------------------------------------------------------------- ----------------- Sat, 02 Feb 2002 11:26:06 +0000 jmmckel&$at$&attglobal.net Which one makes the most sense from your point of view? How many predictors are you using? I would take all the various fits back to a common value, perhaps the raw IC50, and see where the error is the smallest.... John McKelvey ------------------------------------------------------------- __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com