CCL: InChI version1.02beta; introducing InChIKey



 Sent to CCL by: "Soaring Bear Ph.D." [soaringbear*yahoo.com]
 --- "steve heller srheller[#]nist.gov"
 <owner-chemistry###ccl.net> wrote:
 > A new beta-release of the InChI software is now available
 > from the IUPAC web site (www.iupac.org/inchi).
 >
 > The principal new features of this release are:
 >
 > (1) A fixed-length (25-character) condensed digital
 > representation of the Identifier to be known asInChIKey.
 ......
 > Caffeine:
 InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
 >
 > InChIKey=RYYVLZVUVIJVGH-UHFFFAOYAW
 This change from a semi-readable smiles-like string to a string
 which on casual appearance seems meaningless is a very
 substantial change that belies the subtle version change from
 1.01 to 1.02.  I have observed that the original InChI was just
 coming into acceptance and wider use and now suddenly it is
 being discarded.  I'm sure there must have been very good
 reasons for this change, but the magnitude of the change
 undermines confidence in the usage of it.  What assurance is
 there that an equally major change isn't done in another 3
 years?
 >....There is a finite, but very small probability of
 > finding two structures with the same InChIKey. For
 > duplication of only the first block of 14 characters this is
 > 1.3% in a thousand million, equivalent to a single collision
 > in one of 75 databases of one thousand million compounds
 > each.
 is that calculated on a random theoretically diverse set of
 structures or in the real world (of both nature and
 combinatorial) where clustering occurs, and more (or less)
 duplication may occur?
 Soaring Bear Ph.D. Pharmacology  soaringbear*yahoo.com
 http://soaringbear.tripod.com/nature/weedsforneeds.html
 http://www.nlm.nih.gov/mesh/presentations/bear_2005_aug/index.htm
 author of http://HERBMED.org