QSAR IC50 (SUMMARY and more questions)
- From: Richard SMITH <qsarcadd |-at-| yahoo.com>
- Subject: QSAR IC50 (SUMMARY and more questions)
- Date: Tue, 5 Feb 2002 22:23:58 -0800 (PST)
Dear CCLers:
Thank you very much for your replies. As there
were so many requests for the summary, I am sending
the summary here. But this summary provokes me more
questions. So I dont want to stop here. I would
like to hear from you on the follwoing:
Is there any rule of thumb on minimum, maximum,
and ideal number of compounds required for a
generating a statistically significant QSAR equation.
Can we use all the compounds from the same cluster of
compounds, (but the important features may not
show up in the final equation, as they may be common
to most of the compounds). Then how do we handle this?
Or is it necessary that one should use a diverse set
of compounds? I mean compounds from various clusters
with varying IC50?
I would like to hear from you more on this aspects.
Enclosed please find the summary for my earlier query:
---------------------
My query
Dear Friends:
I am trying to build QSAR using 17 compounds
for which IC50 values are available. Do I need to
use the IC50 as it is or should I take log(IC50)
or -log(IC50). The values range from 10 to 100.
I tried IC50 and log(IC50) using GFA. I am getting
different QSAR equation. Could you please advice
me on how to proceed. Thanks in advance.
Smith
----------------------REPLIES-------------
"C D'Silva" <C.Dsilva |-at-| mmu.ac.uk>
Mon, 4 Feb 2002 19:19:21 GMT
Dear richard,
you take the log (1/IC50) and plot it against your
physicochemical parameter.
Biological activity is always plotted on the y axis
and the parameter on the
x-axis (eg. Hammett or logP). If you get a linear plot
the biological activity
is dependent on one parameter. However it is more than
often that the
biological activity is dependent on two or more
parameters
If you get a parabolic plot then there will be a
parameter that will be
present to the power of 2: this is usually log p or
TT.
I enclose attachments to two papers on the topic.
Best Wishes
Claudius
------------------------------------
"quanph" <quanph |-at-| hcmuns.edu.vn>
Tue, 5 Feb 2002 09:28:18 +0700
Dear Richard SMITH,
I 'm receive of you. Actually, I want to find the
equation for you.
I 'm studying QSArs, but I the same you in this field.
your question
"the reason why only logIC50 has to be used in the
QSAR equation and not
raw IC50 values", I think because logIC50 is linear
than raw IC50 values and
easy find QSAR equation. And the QSARS method classic
only find QSAR equation
> from multi regression. As I know ANN, Fuzzy Logic and
GA in GFA solve
non-linear equation, find good structure-activity
relationships. You can
to use logIC50 and raw IC50 values for QSARs.
Phung Quan
---------------------------
"Dave Young" <dave.young |-at-| springmail.com>
Mon, 04 Feb 2002 09:51:13 -0500
Richard,
It sounds like you have a difficult job. First of
all, one order of
magnitude isn't a very big spread by biochemical
standards. Second, you
only have a small number of compounds. Also, you
didn't specify
whether the IC50's were obtained from biochemical
assays or cell culture
assay, or whether kinetics indicate competitive
inhibition.
In any case, I would like to see you summarize the
replies to your
question on the CCl. Here are some things to
consider.
First of all, you expect to see docking energies
proportional to Ki
values from biochemical assays. This is a simple
Ahhrenius relationship
between energy and kinetics.
Docking can show extremely reactive compounds to bind
well. However,
if the compound is so reactive that it binds to
residues on the surface
of the enzyme instead of just in the active site, then
there may be no
inhibitory effect in biochemical assays since so
little of it was left
to interact with the active site.
3D QSAR pharmacophore models are, to a first
approximation, directly
proportional to docking energies. Unfortunately, this
isn't necessarily
a rigorous relationship, since charge-charge
interactions are much
stronger than the interaction between hydrophobic
groups. Also,
hydrophobic, hydrophobic interactions are difficult to
represent correctly, since
they are important relative to the interaction between
the hydrophobic
group and a polar solvent, or cell cytoplasm in the
case of biological
systems.
QSAR prediction of cell culture data becomes even more
slippery. First
of all, a compound that binds well in biohchemical
assays may not be
lipophillic enough to get through the cell wall if it
must interact with
an enzyme inside of the cell. Second, something that
works in
biochemical assays may be toxic to cells. It might
even interact with some
other enzyme, giving unpredictable results. The
choice of target enzyme
may be based on an incomplete knowledge of some
biochemical assay, so
inhibiting it may have unexpected side effects.
In theory, a conventional QSAR equation can
incorporate information
> from docking, 3D QSAR, lipophillicity, and toxicity to
predict cell
culture results, assuming that some of the other
things that could go wrong
don't. You always prefer to work with a large amount
of data,
prefereably with activities spanning many orders of
magnitude.
Drug design work is a matter of very delicate
balances. You want
something that binds well in the active site, but
isn't so reactive it
doesn't bind elsewhere. You wan't it specific to one
enzyme, but able to
inhibit all serotypes. You want it lipophillic enough
to be
bioavailable, but if it is too lipophillic, the liver
will remove it from the body
too quickly. If kinetics don't show the competitive
inhibition of the
enzyme, then your problems increase by several orders
of magnitude.
I know you are probably already familiar with most of
what I have
written here, but it is good to stand back and examine
all aspects of the
problem before focusing in on the task of the day.
There are a selection
of commercial tools available to attack these
problems, but there is
always something you would like that the commercial
software doesn't do.
There are also some companies developing some
interesting ideas for
combating problems like bioavailability and resistance
build-up.
Unfortunately, I can't give details about the drug
design tools my company has
been developing in this forum, but you can find some
public info at
www.exegenicsinc.com
Good luck with your project.
----------------------------------
Mon, 4 Feb 2002 08:37:31 -0200 (BDB)
"antonio luiz oliveira de noronha"
<noronha |-at-| dedalus.lcc.ufmg.br>
Hi
About your QSAR question. In fact, as in the Dr.
Kubiny book, it
doesn't matter, you can use whatever unit you want.
But You should be aware that, in log units, usually
the IC50 behaves
linearly, and if you are doing a linear regression is
will work fine.
If you don't take the log, don't expect the IC50 to
behave linearly, so
don't use a linear regression.
You should use IC50's ranging 2 log units at least to
have a good
model.
Also, assuming that your data behaves linearly, in log
units or not,
the equations will be different as they will reflect
different values of
IC50, and a different range of values, also.
My advice is use -log, or, if you want use the
straight data, use a
program capable to recognize/deal with non linear
data.
Regards
--------------------
Mon, 4 Feb 2002 09:52:54 +1100
Dave.Winkler |-at-| csiro.au
Hi, Richard,
You should generally use log IC50 if possible, as the
linear free
energy relationships on which QSAR is based relate the
energetics of
binding to the log of the equilibrium constant for
binding ki (or
something related to it such as IC50)
delta G = -rt lnk
--
Cheers,
Dave
------------------------------------
02 Feb 2002 12:14:21 PST
"Alan Shusterman"
<Alan.Shusterman |-at-| directory.reed.edu>
log(IC50) and -log(IC50) are equivalent, right? You
should get the same
QSAR except the signs of the coefficients are
reversed.
As for IC50 vs. log(IC50), most people work with the
latter (partly because
it makes the data more linear). Of course, these will
(should) give different
QSAR. If logX gives a linear relationship another
variable, then X by itself
should give an exponential relationship.
-Alan
-------------------------------------
Sat, 2 Feb 2002 17:53:54 +0100
"Jeremy R. Greenwood" <jeremy |-at-| compchem.dfh.dk>
The way I see it (and I'm no expert) IC50 is a type of
equilibium constant, and the log of IC50 is related
to binding energy (deltaG=1.36*(logpKI) + c, in
kcal/mol).
Since the type of equations you are likely to build
are
linear, and related somehow to properties which are
assumed additive and hopefully related to binding
energies, you want logIC50.
Hope this helps a little,
Jeremy
----------------------------------------------------------------------
Jeremy Greenwood
jeremy |-at-| greenwood.net
Department of Medicinal Chemistry
bh +45 35306117
Royal Danish School of Pharmacy
fx +45 35306040
Universitetsparken 2, DK-2100 Copenhagen, Denmark
ah +45 32598030
----------------------------------------------------------------------
-----------------
Sat, 02 Feb 2002 11:26:06 +0000
jmmckel |-at-| attglobal.net
Which one makes the most sense from your point of
view? How many
predictors are you using? I would take all the
various fits back to
a common value, perhaps the raw IC50, and see where
the error
is the smallest....
John McKelvey
-------------------------------------------------------------
__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com