CCL: questions about ECFPs
- From: "Zhao Yuan" <ccl++mail.sioc.ac.cn>
- Subject: CCL: questions about ECFPs
- Date: Fri, 11 Jul 2008 08:50:00 -0400
Sent to CCL by: "Zhao Yuan" [ccl]~[mail.sioc.ac.cn]
Hi everyone,
Recently, I've read the paper about how to generate
Extended-Connectivity Fingerprints.
/////////////////////////////////////////////
High-Throughput Data Analysis. 1. Extended-Connectivity Fingerprints:
A High-Dimensional Descriptor for Molecular Data Analysis
David Rogers* and Mathew Hahn
SciTegic, Inc.
/////////////////////////////////////////////
It mentioned that ECFPs can be rapidly calculated and can represent
a very large number of different features. So I want to use it to
compare two molecules or calculate similarity between them.
However, I met some detailed and technical problem when following
its method.
The first problem is the hash function. I used lots of hash function
to encode the initial atom identifiers but none of them is identical
to the result in the reference. Does anyone know what hash function
it used?
Second, after the first iteration, the code of root atom's neighbors
are attached to the code of root atom. Then it got a array like this:
[1, 3194967052, 1, 1559650422, 1, 1572579716, 2, 3220825640]
I wondered whether it needed to sort again.
(in my program, the array was like this:
13194967052, 11559650422, 11572579716, 23220825640
then I converted them to a sorted or unsorted string which will
be used for hash function.
sorted string: 11559650422115725797161319496705223220825640
unsorted string: 13194967052115596504221157257971623220825640
but whatever string I used, the new features I got was different
> from the reference result.
Third, for the second iteration, some atoms may connect to the same
neighboring atoms. Such as in a four membered ring, B and C are the
neighbor of atom A, while D connected to B and C. In the second
iteration, which atom should the D's code attach (B or C or Both)?
A----B
| |
C----D
At last, can anybody give me a detailed example of the ECFPs_4.
The initial identifier of each atom and the identifiers in each
iterations ( identifier before hash and after hash ).
I've tried to correspond with the author Dr. Rogers, however his
e-mail is not valid now.
I sincerely appreciate if anyone can give me some help in
resolving the problem.
Best Regards,
Zhao Yuan
------------------------------------------------------------
State Key Lab of Bio-organic and Natural Products Chemistry
Shanghai Institute of Organic Chemistry (SIOC),
Chinese Academy of Sciences.
Addr. 354, Fenglin Road, Shanghai, China.
Tel.: +86-21-54925275
Email: yzhao.^.mail.sioc.ac.cn