CCL: questions about ECFPs



 Sent to CCL by: "Zhao  Yuan" [ccl]~[mail.sioc.ac.cn]
 Hi everyone,
   Recently, I've read the paper about how to generate
 Extended-Connectivity Fingerprints.
 /////////////////////////////////////////////
 High-Throughput Data Analysis. 1. Extended-Connectivity Fingerprints:
  A High-Dimensional Descriptor for Molecular Data Analysis
   David Rogers* and Mathew Hahn
   SciTegic, Inc.
 /////////////////////////////////////////////
 It mentioned that ECFPs can be rapidly calculated and can represent
 a very large number of different features. So I want to use it to
 compare two molecules or calculate similarity between them.
   However, I met some detailed and technical problem when following
 its method.
 The first problem is the hash function. I used lots of hash function
 to encode the initial atom identifiers but none of them is identical
 to the result in the reference. Does anyone know what hash function
 it used?
 Second, after the first iteration, the code of root atom's neighbors
 are attached to the code of root atom. Then it got a array like this:
     [1, 3194967052, 1, 1559650422, 1, 1572579716, 2, 3220825640]
 I wondered whether it needed to sort again.
 (in my program, the array was like this:
      13194967052, 11559650422, 11572579716, 23220825640
  then I converted them to a sorted or unsorted string which will
  be used for hash function.
     sorted string: 11559650422115725797161319496705223220825640
   unsorted string: 13194967052115596504221157257971623220825640
 but whatever string I used, the new features I got was different
 > from the reference result.
 Third, for the second iteration, some atoms may connect to the same
 neighboring atoms. Such as in a four membered ring, B and C are the
 neighbor of atom A, while D connected to B and C. In the second
 iteration, which atom should the D's code attach (B or C or Both)?
 A----B
 |    |
 C----D
 At last, can anybody give me a detailed example of the ECFPs_4.
 The initial identifier of each atom and the identifiers in each
 iterations ( identifier before hash and after hash ).
 I've tried to correspond with the author Dr. Rogers, however his
 e-mail is not valid now.
 I sincerely appreciate if anyone can give me some help in
 resolving the problem.
 Best Regards,
 Zhao Yuan
 ------------------------------------------------------------
 State Key Lab of Bio-organic and Natural Products Chemistry
 Shanghai Institute of Organic Chemistry (SIOC),
 Chinese Academy of Sciences.
 Addr. 354, Fenglin Road, Shanghai, China.
 Tel.: +86-21-54925275
 Email: yzhao.^.mail.sioc.ac.cn