CCL: How to identify duplicates for the same molecule that correspond to
different tautomers or protonation states?
- From: Zsolt Zsoldos <zsolt|a|simbiosys.ca>
- Subject: CCL: How to identify duplicates for the same molecule that
correspond to different tautomers or protonation states?
- Date: Thu, 16 Dec 2010 11:58:49 -0500
Sent to CCL by: Zsolt Zsoldos [zsolt!^!simbiosys.ca]
Dear Gerard,
The convert executable in the ehits package (that you have as far as I
know) will transform the input molecule to a generic protonation form
and use 'aromatic' bond type instead of alternating single/double,
thus after conversion all tautomeric and protonation forms become an
identical (canonical) form, so a simply identity check is sufficient
at that point. For processing a large library of ligands and find the
identical pairs, I would suggest to use a Morgan hash code as a fast
solution.
Best regards,
Zsolt
On Thu, Dec 16, 2010 at 10:19 AM, Gerard Pujadas
gerard.pujadas:gmail.com <owner-chemistry++ccl.net> wrote:
> Dear CCL subscribers,
>
> I am interested in building a chemical database from different chemical
> suppliers databases. One of the things that I would like to avoid is the
> repetition of different structures in my database that correspond to the
> same molecule (for instance, if in the database of two different chemical
> suppliers the structure for the same molecule is in a different tautomer or
> protonation state, I would like to identify that they correspond to the
same
> molecule). Any suggestion about how to achieve this?
>
> With many thanks in advances for your help
>
> Yours sincerely
>
> Gerard
>
>
> --
> Gerard Pujadas
> http://bioquimica.urv.cat/eng/fitxa.jsp?id=22
> Nutrigenomics Research Group
> phone +34 977 55 (9565)
> Biochemistry and Biotechnology Department
> Universitat Rovira i Virgili
> Tarragona, Catalonia
>
--
Zsolt Zsoldos