CCL: CAS number to Smile String
- From: "David A. Mannock"
<dmannock[A]ualberta.ca>
- Subject: CCL: CAS number to Smile String
- Date: Fri, 11 Sep 2009 16:46:18 -0600
Don't forget that there are both simple and complex Smile string formats.
The later allow for chirality, whereas the former do not. Dave
At 06:11 PM 10/09/2009, you wrote:
There are a series of ChemSPider
web services to use if you wish to avail yourself of them. The services
are listed online here:
http://www.chemspider.com/Search.asmx
In terms of internet-based quality there are significant issues when it
comes to registry numbers. For example, take a simple element like
Carbon. CAS?s own website declares the CAS registry number as 7440-44-0:
http://www.commonchemistry.org/search.aspx?terms=carbon
A search on PubChem will give two hits: methane and progesterone:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&term=7440-44-0
Some of these issues have proliferated into databases utilizing PubChem
as the seed set so a search on the NCI database also gives methane:
http://cactus.nci.nih.gov/chemical/structure/7440-44-0/file?format=sdf
There is no easy way to curate and validate CAS registry numbers as it
requires access to the CAS Registry to validate so only investigative
work and diligence can improve this situation and the majority of
databases are not resourced to curate the data. It is also not allowable
to use SciFinder to validate Registry Numbers as stated here
http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Chemistry/CAS_validation
Those databases with the resources and mission to curate do get it right
and I recommend ChEBI as an example:
http://www.ebi.ac.uk/ebisearch/search.ebi?db=smallMolecules&t=7440-44-0
Please note that many of the identifiers in ChemSPider, including CAS
numbers, have been curated and validated by curators and by the public
(crowdsourced curation). While the database is far from perfect we have
put considerable effort into validating the data.
Feel free to contact us if you need help.
Antony Williams, VP Strategic Development
ChemSpider, Royal Society of Chemistry
US Office: 904 Tamaras Circle, Wake Forest, NC-27587
Phone: +1 (919) 201-1516
Fax: +1 (919) 300-5321
From: owner-chemistry+wdi==xemistry.com-,-ccl.net
[
mailto:owner-chemistry+wdi==xemistry.com-,-ccl.net] On Behalf Of
Rajarshi Guha rajarshi.guha+*+gmail.com
Sent: Thursday, September 10, 2009 7:30 PM
To: Ihlenfeldt, Wolf D
Subject: CCL: CAS number to Smile String
On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li
baotogo2004#,#gmail.com
<owner-chemistry]=cl.net
> wrote:
Sent to CCL by: "Chunhui Li"
[baotogo2004_._gmail.com]
Hi Dear ALl,
I want to generate smile strings for more than 100 chemicals. All I have
is the CAS number for each chemical. Is there any tools I can use to do
this automatically?
One way to get it is to use the PubChem synonym tables - this is
obviously not a comprehensive solution as it doesn't contain the entire
CAS registry.
There's a simple REST interface (based on a mirror of PubChem at IU). For
example, given a CAS for aspirin visit
http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2smi/69-46-5
and you get back the SMILES for it. In general, replace the CAS at the
end of the URL to whatever you want
--
Rajarshi Guha
NIH Chemical Genomics Center