CCL: How to identify duplicates for the same molecule that correspond to different tautomers or protonation states?



 Sent to CCL by: Zsolt Zsoldos [zsolt!^!simbiosys.ca]
 Dear Gerard,
 The convert executable in the ehits package (that you have as far as I
 know) will transform the input molecule to a generic protonation form
 and use 'aromatic' bond type instead of alternating single/double,
 thus after conversion all tautomeric and protonation forms become an
 identical (canonical) form, so a simply identity check is sufficient
 at that point. For processing a large library of ligands and find the
 identical pairs, I would suggest to use a Morgan hash code as a fast
 solution.
 Best regards,
 Zsolt
 On Thu, Dec 16, 2010 at 10:19 AM, Gerard Pujadas
 gerard.pujadas:gmail.com <owner-chemistry++ccl.net> wrote:
 > Dear CCL subscribers,
 >
 > I am interested in building a chemical database from different chemical
 > suppliers databases. One of the things that I would like to avoid is the
 > repetition of different structures in my database that correspond to the
 > same molecule (for instance, if in the database of two different chemical
 > suppliers the structure for the same molecule is in a different tautomer or
 > protonation state, I would like to identify that they correspond to the
 same
 > molecule). Any suggestion about how to achieve this?
 >
 > With many thanks in advances for your help
 >
 > Yours sincerely
 >
 > Gerard
 >
 >
 > --
 > Gerard Pujadas
 > http://bioquimica.urv.cat/eng/fitxa.jsp?id=22
 > Nutrigenomics Research Group
 > phone +34 977 55 (9565)
 > Biochemistry and Biotechnology Department
 > Universitat Rovira i Virgili
 > Tarragona, Catalonia
 >
 --
 Zsolt Zsoldos