CCL: CAS number to Smile String



Don't forget that there are both simple and complex Smile string formats. The later allow for chirality, whereas the former do not. Dave

At 06:11 PM 10/09/2009, you wrote:
There are a series of ChemSPider web services to use if you wish to avail yourself of them. The services are listed online here: http://www.chemspider.com/Search.asmx
 
In terms of internet-based quality there are significant issues when it comes to registry numbers. For example, take a simple element like Carbon. CAS?s own website declares the CAS registry number as 7440-44-0: http://www.commonchemistry.org/search.aspx?terms=carbon
 
A search on PubChem will give two hits: methane and progesterone: http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&term=7440-44-0
 
Some of these issues have proliferated into databases utilizing PubChem as the seed set so a search on the NCI database also gives methane: http://cactus.nci.nih.gov/chemical/structure/7440-44-0/file?format=sdf
 
There is no easy way to curate and validate CAS registry numbers as it requires access to the CAS Registry to validate so only investigative work and diligence can improve this situation and the majority of databases are not resourced to curate the data. It is also not allowable to use SciFinder to validate Registry Numbers as stated here http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Chemistry/CAS_validation
 
Those databases with the resources and mission to curate do get it right and I recommend ChEBI as an example: http://www.ebi.ac.uk/ebisearch/search.ebi?db=smallMolecules&t=7440-44-0
 
Please note that many of the identifiers in ChemSPider, including CAS numbers, have been curated and validated by curators and by the public (crowdsourced curation). While the database is far from perfect we have put considerable effort into validating the data.
 
Feel free to contact us if you need help.
 
[] Antony Williams, VP Strategic Development
ChemSpider, Royal Society of Chemistry
US Office: 904 Tamaras Circle, Wake Forest, NC-27587
 
Phone: +1 (919) 201-1516
Fax: +1 (919) 300-5321
 
From: owner-chemistry+wdi==xemistry.com-,-ccl.net [ mailto:owner-chemistry+wdi==xemistry.com-,-ccl.net] On Behalf Of Rajarshi Guha rajarshi.guha+*+gmail.com
Sent: Thursday, September 10, 2009 7:30 PM
To: Ihlenfeldt, Wolf D
Subject: CCL: CAS number to Smile String
 
 
On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li baotogo2004#,#gmail.com <owner-chemistry]=cl.net > wrote:

Sent to CCL by: "Chunhui  Li" [baotogo2004_._gmail.com]
Hi Dear ALl,

I want to generate smile strings for more than 100 chemicals. All I have is the CAS number for each chemical. Is there any tools I can use to do this automatically?

One way to get it is to use the PubChem synonym tables - this is obviously not a comprehensive solution as it doesn't contain the entire CAS registry.

There's a simple REST interface (based on a mirror of PubChem at IU). For example, given a CAS for aspirin visit

http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2smi/69-46-5

and you get back the SMILES for it. In general, replace the CAS at the end of the URL to whatever you want



--
Rajarshi Guha
NIH Chemical Genomics Center