CCL: InChI version1.02beta; introducing InChIKey
- From: "Soaring Bear Ph.D."
<soaringbear.:.yahoo.com>
- Subject: CCL: InChI version1.02beta; introducing InChIKey
- Date: Sat, 8 Sep 2007 11:53:11 -0700 (PDT)
Sent to CCL by: "Soaring Bear Ph.D." [soaringbear*yahoo.com]
--- "steve heller srheller[#]nist.gov"
<owner-chemistry###ccl.net> wrote:
> A new beta-release of the InChI software is now available
> from the IUPAC web site (www.iupac.org/inchi).
>
> The principal new features of this release are:
>
> (1) A fixed-length (25-character) condensed digital
> representation of the Identifier to be known asInChIKey.
......
> Caffeine:
InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
>
> InChIKey=RYYVLZVUVIJVGH-UHFFFAOYAW
This change from a semi-readable smiles-like string to a string
which on casual appearance seems meaningless is a very
substantial change that belies the subtle version change from
1.01 to 1.02. I have observed that the original InChI was just
coming into acceptance and wider use and now suddenly it is
being discarded. I'm sure there must have been very good
reasons for this change, but the magnitude of the change
undermines confidence in the usage of it. What assurance is
there that an equally major change isn't done in another 3
years?
>....There is a finite, but very small probability of
> finding two structures with the same InChIKey. For
> duplication of only the first block of 14 characters this is
> 1.3% in a thousand million, equivalent to a single collision
> in one of 75 databases of one thousand million compounds
> each.
is that calculated on a random theoretically diverse set of
structures or in the real world (of both nature and
combinatorial) where clustering occurs, and more (or less)
duplication may occur?
Soaring Bear Ph.D. Pharmacology soaringbear*yahoo.com
http://soaringbear.tripod.com/nature/weedsforneeds.html
http://www.nlm.nih.gov/mesh/presentations/bear_2005_aug/index.htm
author of http://HERBMED.org