QSAR IC50 (SUMMARY and more questions)



Dear CCLers:
 Thank you very much for your replies. As there
 were so many requests for the summary, I am sending
 the summary here. But this summary provokes me more
 questions. So I dont want to stop here. I would
 like to hear from you on the follwoing:
 Is there any rule of thumb on minimum, maximum,
 and ideal number of compounds required for a
 generating a statistically significant QSAR equation.
 Can we use all  the compounds from the same cluster of
 compounds, (but the important features may not
 show up in the final equation, as they may be common
 to most of the compounds). Then how do we handle this?
 Or is it necessary that one should use a diverse set
 of compounds? I mean compounds from various clusters
 with varying IC50?
 I would like to hear from you more on this aspects.
 Enclosed please find the summary for my earlier query:
 ---------------------
 My query
  Dear Friends:
        I am trying to build QSAR using 17 compounds
  for which IC50 values are available. Do I need to
  use the IC50 as it is or should I take log(IC50)
  or -log(IC50). The values range from 10 to 100.
  I tried IC50 and log(IC50) using GFA. I am getting
  different QSAR equation. Could you please advice
  me on how to proceed. Thanks in advance.
        Smith
 ----------------------REPLIES-------------
          "C D'Silva" <C.Dsilva |-at-| mmu.ac.uk>
          Mon, 4 Feb 2002 19:19:21 GMT
 Dear richard,
 you take the log (1/IC50) and plot it against your
 physicochemical parameter.
 Biological activity is always plotted on the y axis
 and the parameter on the
 x-axis (eg. Hammett or logP). If you get a linear plot
 the biological activity
 is dependent on one parameter. However it is more than
 often that the
 biological activity is dependent on two or more
 parameters
 If you get a parabolic plot then there will be a
 parameter that will be
 present to the power of 2: this is usually log p or
 TT.
 I enclose attachments to two papers on the topic.
 Best Wishes
 Claudius
 ------------------------------------
          "quanph" <quanph |-at-| hcmuns.edu.vn>
          Tue, 5 Feb 2002 09:28:18 +0700
 Dear Richard SMITH,
 I 'm receive of you. Actually, I want to find the
 equation for you.
 I 'm studying QSArs, but I the same you in this field.
  your question
 "the reason why only logIC50 has to be used in the
 QSAR equation and not
 raw IC50 values", I think because logIC50 is  linear
 than raw IC50 values and
 easy find QSAR equation. And the QSARS method classic
 only find QSAR equation
 > from multi regression. As I know ANN, Fuzzy Logic and
 GA in GFA solve
 non-linear equation, find good structure-activity
 relationships. You can
 to use logIC50 and raw IC50 values for QSARs.
 Phung Quan
 ---------------------------
       "Dave Young" <dave.young |-at-| springmail.com>
       Mon, 04 Feb 2002 09:51:13 -0500
 Richard,
 It sounds like you have a difficult job.  First of
 all, one order of
 magnitude isn't a very big spread by biochemical
 standards.  Second, you
 only have a small number of compounds.  Also, you
 didn't specify
 whether the IC50's were obtained from biochemical
 assays or cell culture
 assay, or whether kinetics indicate competitive
 inhibition.
 In any case, I would like to see you summarize the
 replies to your
 question on the CCl.  Here are some things to
 consider.
 First of all, you expect to see docking energies
 proportional to Ki
 values from biochemical assays.  This is a simple
 Ahhrenius relationship
 between energy and kinetics.
 Docking can show extremely reactive compounds to bind
 well.  However,
 if the compound is so reactive that it binds to
 residues on the surface
 of the enzyme instead of just in the active site, then
 there may be no
 inhibitory effect in biochemical assays since so
 little of it was left
 to interact with the active site.
 3D QSAR pharmacophore models are, to a first
 approximation, directly
 proportional to docking energies.  Unfortunately, this
 isn't necessarily
 a rigorous relationship, since charge-charge
 interactions are much
 stronger than the interaction between hydrophobic
 groups.  Also,
 hydrophobic, hydrophobic interactions are difficult to
 represent correctly, since
 they are important relative to the interaction between
 the hydrophobic
 group and a polar solvent, or cell cytoplasm in the
 case of biological
 systems.
 QSAR prediction of cell culture data becomes even more
 slippery.  First
 of all, a compound that binds well in biohchemical
 assays may not be
 lipophillic enough to get through the cell wall if it
 must interact with
 an enzyme inside of the cell.  Second, something that
 works in
 biochemical assays may be toxic to cells.  It might
 even interact with some
 other enzyme, giving unpredictable results.  The
 choice of target enzyme
 may be based on an incomplete knowledge of some
 biochemical assay, so
 inhibiting it may have unexpected side effects.
 In theory, a conventional QSAR equation can
 incorporate information
 > from docking, 3D QSAR, lipophillicity, and toxicity to
 predict cell
 culture results, assuming that some of the other
 things that could go wrong
 don't.   You always prefer to work with a large amount
 of data,
 prefereably with activities spanning many orders of
 magnitude.
 Drug design work is a matter of very delicate
 balances.  You want
 something that binds well in the active site, but
 isn't so reactive it
 doesn't bind elsewhere.  You wan't it specific to one
 enzyme, but able to
 inhibit all serotypes.  You want it lipophillic enough
 to be
 bioavailable, but if it is too lipophillic, the liver
 will remove it from the body
 too quickly.  If kinetics don't show the competitive
 inhibition of the
 enzyme, then your problems increase by several orders
 of magnitude.
 I know you are probably already familiar with most of
 what I have
 written here, but it is good to stand back and examine
 all aspects of the
 problem before focusing in on the task of the day.
 There are a selection
 of commercial tools available to attack these
 problems, but there is
 always something you would like that the commercial
 software doesn't do.
 There are also some companies developing some
 interesting ideas for
 combating problems like bioavailability and resistance
 build-up.
 Unfortunately, I can't give details about the drug
 design tools my company has
 been developing in this forum, but you can find some
 public info at
 www.exegenicsinc.com
 Good luck with your project.
 ----------------------------------
      Mon, 4 Feb 2002 08:37:31 -0200 (BDB)
      "antonio luiz oliveira de noronha"
 <noronha |-at-| dedalus.lcc.ufmg.br>
 Hi
 About your QSAR question. In fact, as in the Dr.
 Kubiny book, it
 doesn't matter, you can use whatever unit you want.
 But You should be aware that, in log units, usually
 the IC50 behaves
 linearly, and if you are doing a  linear regression is
 will work fine.
 If you don't take the log, don't expect the IC50 to
 behave linearly, so
 don't use a linear regression.
 You should use IC50's ranging 2 log units at least to
 have a good
 model.
 Also, assuming that your data behaves linearly, in log
 units or not,
 the equations will be different as they will reflect
 different values of
 IC50, and a different range of values, also.
 My advice is use -log, or, if you want use the
 straight data, use a
 program capable to recognize/deal with non linear
 data.
 Regards
 --------------------
      Mon, 4 Feb 2002 09:52:54 +1100
      Dave.Winkler |-at-| csiro.au
 Hi, Richard,
 You should generally use log IC50 if possible, as the
 linear free
 energy relationships on which QSAR is based relate the
 energetics of
 binding to the log of the equilibrium constant for
 binding ki (or
 something related to it such as IC50)
   delta G = -rt lnk
 --
 Cheers,
 Dave
 ------------------------------------
      02 Feb 2002 12:14:21 PST
      "Alan Shusterman"
 <Alan.Shusterman |-at-| directory.reed.edu>
 log(IC50) and -log(IC50) are equivalent, right? You
 should get the same
 QSAR except the signs of the coefficients are
 reversed.
 As for IC50 vs. log(IC50), most people work with the
 latter (partly because
 it makes the data more linear). Of course, these will
 (should) give different
 QSAR. If logX gives a linear relationship another
 variable, then X by itself
 should give an exponential relationship.
 -Alan
 -------------------------------------
      Sat, 2 Feb 2002 17:53:54 +0100
      "Jeremy R. Greenwood" <jeremy |-at-| compchem.dfh.dk>
 The way I see it (and I'm no expert) IC50 is a type of
 equilibium constant, and the log of IC50 is related
 to binding energy (deltaG=1.36*(logpKI) + c, in
 kcal/mol).
 Since the type of equations you are likely to build
 are
 linear, and related somehow to properties which are
 assumed additive and hopefully related to binding
 energies, you want logIC50.
 Hope this helps a little,
 Jeremy
 ----------------------------------------------------------------------
 Jeremy Greenwood
 jeremy |-at-| greenwood.net
 Department of Medicinal Chemistry
 bh +45 35306117
 Royal Danish School of Pharmacy
 fx +45 35306040
 Universitetsparken 2, DK-2100 Copenhagen, Denmark
 ah +45 32598030
 ----------------------------------------------------------------------
 -----------------
       Sat, 02 Feb 2002 11:26:06 +0000
       jmmckel |-at-| attglobal.net
 Which one makes the most sense from your point of
 view?  How many
 predictors are you using?  I would take all the
 various fits back to
 a common value, perhaps the raw IC50, and see where
 the error
 is the smallest....
 John McKelvey
 -------------------------------------------------------------
 __________________________________________________
 Do You Yahoo!?
 Send FREE Valentine eCards with Yahoo! Greetings!
 http://greetings.yahoo.com