CCL: How to identify duplicates for the same molecule that correspond to different tautomers or protonation states?



 Sent to CCL by: Mikko Vainio [mikko.vainio .. abo.fi]
 On 12/16/2010 05:19 PM, Gerard Pujadas gerard.pujadas:gmail.com wrote:
 
 Dear CCL subscribers,
 
I am interested in building a chemical database from different chemical suppliers databases. One of the things that I would like to avoid is the repetition of different structures in my database that correspond to the same molecule (for instance, if in the database of two different chemical suppliers the structure for the same molecule is in a different tautomer or protonation state, I would like to identify that they correspond to the same molecule). Any suggestion about how to achieve this?
 With many thanks in advances for your help
 Yours sincerely
 Gerard
 
 Dear Gerard,
 
The InChI program (http://www.iupac.org/inchi/) can be used to weed out duplicates in different databases. The InChI program does normalization of tautomeric states. The resulting identifier string includes a "charge layer" that you can easily ignore when doing comparisons. See also http://www.inchi-trust.org/
 Best regards,
 Mikko