From owner-chemistry@ccl.net Thu Sep 10 01:01:00 2009 From: "Antonio Chana achana-*-iqfr.csic.es" To: CCL Subject: CCL: Molecular descriptor selection Message-Id: <-40204-090909135502-11772-MlhDo5MgI87aOVuibu+VxA ~ server.ccl.net> X-Original-From: Antonio Chana Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1; format=flowed Date: Wed, 09 Sep 2009 18:49:52 +0200 MIME-Version: 1.0 Sent to CCL by: Antonio Chana [achana|*|iqfr.csic.es] Hi Sangeetha, well, that's a question with no simple answer since descriptor selection is a bit of an art. I daresay a bit of a pain as well. For a better understanding take a look to the CODESSA webpage referring this problem: http://www.codessa-pro.com/methods/SD.htm Software that obviously contains this option. You might also want to try other approaches like PLS http://www.vcclab.org/lab/pls/ or the UFS approach: http://www.vcclab.org/lab/ufs/ Both can be used on line and they are based upon well known methods like stepwise refinement methodology, Partial Least Squares, Genetic algortihms and such. You can try as well clustering methods using the normal methodology referred to descriptors co-correlation, Principal Components Analysis or even using ANN like Kohonen networks to find candidates to be selected in your model. However there is no warranty about everything going neat and smooth, but I believe the two online options are rather good for starting. Two more advises: 1.- make a previous filtration of descriptors. Nearly constant or extremely high correlated descriptors are useless in modeling. 2.- Keep the number of descriptors as low as you can in your final model, otherwise you will enter into the overfitting realm. Good luck, Antonio Sangeetha Subramaniam srdshigella .. gmail.com wrote: > Sent to CCL by: "Sangeetha Subramaniam" [srdshigella],[gmail.com] > Hello everyone, > > I have a query and would be glad to hear all your suggestions. > > While building a model for a set of compounds, how does one make the choice of molecular descriptors. As more than thousands of them can be calculated, it looks good to use all of them. But will it make sense? > > What methods/packages can be used in deciding the right choice of descriptors? > Can you please post relevant software that can be used here.. > > Thanks > Sangeetha.> > > > > From owner-chemistry@ccl.net Thu Sep 10 02:25:01 2009 From: "Kamalakar Jadhav kjadhav[a]vlifesciences.com" To: CCL Subject: CCL: Molecular descriptor selection Message-Id: <-40205-090910022308-29899-JlOA4M4wJZnnt4F7XuOKdg() server.ccl.net> X-Original-From: "Kamalakar Jadhav" Date: Thu, 10 Sep 2009 02:23:04 -0400 Sent to CCL by: "Kamalakar Jadhav" [kjadhav%%vlifesciences.com] Hi You may use all the descriptors, provided you have sufficient number of molecules for training. Alternatively you can use limited latent variables (by PLS or PCA) derived by using all descriptors. However, doing so may always have a chance that you are adding noise in your model (i.e. unnecessary descriptors that are not significant for explaning variation in response). There are many methods for automatic descriptor selection like stepwise forward, forward-backward, backward, GA and SA. You can also reduced the number of descriptors by first pruning intercorrlation of descriptors or remove descriptors on the basis on variance. All this is available in QSARPro software of VLife For more details you can visit www.vlifesciences.com Best Regards, Kamalakar From owner-chemistry@ccl.net Thu Sep 10 05:18:00 2009 From: "andras.borosy__givaudan.com" To: CCL Subject: CCL: Molecular descriptor selection Message-Id: <-40206-090910050409-2595-uwh+cj+KAH/Ye19N5KoMQw(-)server.ccl.net> X-Original-From: andras.borosy**givaudan.com Content-Type: multipart/alternative; boundary="=_alternative 0031C78FC125762D_=" Date: Thu, 10 Sep 2009 11:03:42 +0200 MIME-Version: 1.0 Sent to CCL by: andras.borosy^_^givaudan.com This is a multipart message in MIME format. --=_alternative 0031C78FC125762D_= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: base64 RGVhciBTYW5nZWV0aGENCg0KMCkgUmVhZCB0aGlzIGFydGljbGU6IA0KDQpDdXJyZW50IFN0YXR1 cyBvZiBNZXRob2RzIGZvciBEZWZpbmluZyB0aGUgQXBwbGljYWJpbGl0eQ0KRG9tYWluIG9mIChR dWFudGl0YXRpdmUpIFN0cnVjdHVyZeKAk0FjdGl2aXR5IFJlbGF0aW9uc2hpcHMNCkFUTEEgMzMs IDHigJMxOSwgMjAwNQ0KDQoxKSBZb3Ugc2hvdWxkIGtub3cgdGhlIGF2ZXJhZ2UgZXhwZXJpbWVu dGFsIGVycm9yIG9mIHlvdXIgcmVzcG9uc2UgDQooZGVwZW5kZW50KSB2YXJpYWJsZSwgYmVjYXVz ZSB0aGUgUm9vdCBNZWFuIFNxdWFyZSBFcnJvciAoUk1TRSkgdmFsdWUgIG9mIA0KYW55IG9mIHlv dXIgbW9kZWxzIGluIHRoZSBleHRlcm5hbCB2YWxpZGF0aW9uIHNldCBzaG91bGQgbm90IGJlIGxl c3MgdGhlbiANCnRoYXQuDQoNCjIpIFNlbGVjdCBzb21lIG1lYW5pbmdmdWwgY29uZm9ybWF0aW9u IGludmFyaWFudCBkZXNjcmlwdG9ycyBhbmQgbGluZWFyIA0KcmVsYXRpb25zaGlwIHdpdGggY3Jv c3MgdmFsaWRhdGlvbiBpbiBhbiBleHRlcm5hbCB2YWxpZGF0aW9uIHNldCENCg0KMykgVXNlIGEg dmFyaWFibGUgc2VsZWN0aW9uIGFsZ29ydGlobSEgSSB3b3VsZCBzdWdnZXN0ICB0aGUgR2VuZXRp YyANCkFsZ29yaXRobSAgKFNWTCkgb2YgTW9sZWN1bGFyIE9wZXJhdGluZyBFbnZpcm9ubWVudCAo DQpodHRwOi8vd3d3LmNoZW1jb21wLmNvbS8sIGl0IGhhcyBhbHNvIHZlcnkgZ29vZCBkZXNjcmlw dG9ycykuDQoNCjQpIElmIHlvdXIgcGFyc2ltb25pdXMgbGluZWFyIG1vZGVsIGlzIG5vdCBnb29k IGVub3VnaCwgeW91IG1heSBjaGVjayANCmxpbmVhcml0eSBhbmQvb3IgdXNlIG11Y2ggbW9yZSBk ZXNjcmlwdG9ycy4NCg0KDQpIYXZlIGEgZ29vZCB0aW1lIQ0KDQoNCkRyLiBBbmRyw6FzIFDDqXRl ciBCb3Jvc3kNClNjaWVudGlmaWMgTW9kZWxsaW5nIEV4cGVydA0KDQpGcmFncmFuY2UgUmVzZWFy Y2gNCkdpdmF1ZGFuIFNjaHdlaXogQUcgIC0gIFVlYmVybGFuZHN0cmFzc2UgMTM4ICAtICBDSC04 NjAwICAtICBEw7xiZW5kb3JmICAtIA0KU3dpdHplcmxhbmQNClQ6KzQxLTQ0LTgyNCAyMTY0ICAt ICBGOis0MS00NC04MjQyOTI2ICAgIC0gIGh0dHA6Ly93d3cuZ2l2YXVkYW4uY29tDQoNCg0KDQo+ IA0KPiAtLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tDQo+ID4gRnJvbTogIlNhbmdlZXRoYSBT dWJyYW1hbmlhbSBzcmRzaGlnZWxsYSAuLiBnbWFpbC5jb20iIDxvd25lci0NCj4gY2hlbWlzdHJ5 XT1bY2NsLm5ldD4NCj4gVG86ICJJc2FhYyBCLiAgQmVyc3VrZXIiIDxiZXJzdWtlcl09W21haWwu Y20udXRleGFzLmVkdT4NCj4gU2VudDogV2VkbmVzZGF5LCBTZXB0ZW1iZXIgOSwgMjAwOSA3OjQ2 OjM1IEFNIEdNVCAtMDY6MDAgVVMvQ2FuYWRhIA0KQ2VudHJhbA0KPiBTdWJqZWN0OiBDQ0w6IE1v bGVjdWxhciBkZXNjcmlwdG9yIHNlbGVjdGlvbg0KPiANCj4gDQo+IFNlbnQgdG8gQ0NMIGJ5OiAi U2FuZ2VldGhhICBTdWJyYW1hbmlhbSIgW3NyZHNoaWdlbGxhXSxbZ21haWwuY29tXQ0KPiBIZWxs byBldmVyeW9uZSwNCj4gDQo+IEkgaGF2ZSBhIHF1ZXJ5IGFuZCB3b3VsZCBiZSBnbGFkIHRvIGhl YXIgYWxsIHlvdXIgc3VnZ2VzdGlvbnMuDQo+IA0KPiBXaGlsZSBidWlsZGluZyBhIG1vZGVsIGZv ciBhIHNldCBvZiBjb21wb3VuZHMsIGhvdyBkb2VzIG9uZSBtYWtlIHRoZQ0KPiBjaG9pY2Ugb2Yg bW9sZWN1bGFyIGRlc2NyaXB0b3JzLiBBcyBtb3JlIHRoYW4gdGhvdXNhbmRzIG9mIHRoZW0gY2Fu IA0KPiBiZSBjYWxjdWxhdGVkLCBpdCBsb29rcyBnb29kIHRvIHVzZSBhbGwgb2YgdGhlbS4gQnV0 IHdpbGwgaXQgbWFrZSBzZW5zZT8NCj4gDQo+IFdoYXQgbWV0aG9kcy9wYWNrYWdlcyBjYW4gYmUg dXNlZCBpbiBkZWNpZGluZyB0aGUgcmlnaHQgY2hvaWNlIG9mIA0KZGVzY3JpcHRvcnM/DQo+IENh biB5b3UgcGxlYXNlIHBvc3QgcmVsZXZhbnQgc29mdHdhcmUgdGhhdCBjYW4gYmUgdXNlZCBoZXJl Li4NCj4gDQo+IFRoYW5rcw0KPiBTYW5nZWV0aGEuaHR0cDovL3d3dy5jY2wubmV0L2NnaS1iaW4v Y2NsL3NlbmRfY2NsX21lc3NhZ2VodHRwOi8vd3d3Lg0KPiBjY2wubmV0L2NoZW1pc3RyeS9zdWJf dW5zdWIuc2h0bWxodHRwOi8vd3d3LmNjbC5uZXQvc3BhbW1lcnMudHh0DQo+IA0KPiANCg0K --=_alternative 0031C78FC125762D_= Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: base64 DQo8YnI+PGZvbnQgc2l6ZT0yIGZhY2U9InNhbnMtc2VyaWYiPkRlYXIgU2FuZ2VldGhhPC9mb250 Pg0KPGJyPg0KPGJyPjxmb250IHNpemU9MiBmYWNlPSJzYW5zLXNlcmlmIj4wKSBSZWFkIHRoaXMg YXJ0aWNsZTogJm5ic3A7PC9mb250Pg0KPGJyPg0KPGJyPjxmb250IHNpemU9MiBmYWNlPSJzYW5z LXNlcmlmIj5DdXJyZW50IFN0YXR1cyBvZiBNZXRob2RzIGZvciBEZWZpbmluZw0KdGhlIEFwcGxp Y2FiaWxpdHk8L2ZvbnQ+DQo8YnI+PGZvbnQgc2l6ZT0yIGZhY2U9InNhbnMtc2VyaWYiPkRvbWFp biBvZiAoUXVhbnRpdGF0aXZlKSBTdHJ1Y3R1cmXigJNBY3Rpdml0eQ0KUmVsYXRpb25zaGlwczwv Zm9udD4NCjxicj48Zm9udCBzaXplPTIgZmFjZT0ic2Fucy1zZXJpZiI+QVRMQSAzMywgMeKAkzE5 LCAyMDA1PC9mb250Pg0KPGJyPg0KPGJyPjxmb250IHNpemU9MiBmYWNlPSJzYW5zLXNlcmlmIj4x KSBZb3Ugc2hvdWxkIGtub3cgdGhlIGF2ZXJhZ2UgZXhwZXJpbWVudGFsDQplcnJvciBvZiB5b3Vy IHJlc3BvbnNlIChkZXBlbmRlbnQpIHZhcmlhYmxlLCBiZWNhdXNlIHRoZSBSb290IE1lYW4gU3F1 YXJlDQpFcnJvciAoUk1TRSkgdmFsdWUgJm5ic3A7b2YgYW55IG9mIHlvdXIgbW9kZWxzIGluIHRo ZSBleHRlcm5hbCB2YWxpZGF0aW9uDQpzZXQgc2hvdWxkIG5vdCBiZSBsZXNzIHRoZW4gdGhhdC48 L2ZvbnQ+DQo8YnI+DQo8YnI+PGZvbnQgc2l6ZT0yIGZhY2U9InNhbnMtc2VyaWYiPjIpIFNlbGVj dCBzb21lIG1lYW5pbmdmdWwgY29uZm9ybWF0aW9uDQppbnZhcmlhbnQgZGVzY3JpcHRvcnMgYW5k IGxpbmVhciByZWxhdGlvbnNoaXAgd2l0aCBjcm9zcyB2YWxpZGF0aW9uIGluDQphbiBleHRlcm5h bCB2YWxpZGF0aW9uIHNldCE8L2ZvbnQ+DQo8YnI+DQo8YnI+PGZvbnQgc2l6ZT0yIGZhY2U9InNh bnMtc2VyaWYiPjMpIFVzZSBhIHZhcmlhYmxlIHNlbGVjdGlvbiBhbGdvcnRpaG0hDQpJIHdvdWxk IHN1Z2dlc3QgJm5ic3A7dGhlIEdlbmV0aWMgQWxnb3JpdGhtICZuYnNwOyhTVkwpIG9mIE1vbGVj dWxhciBPcGVyYXRpbmcNCkVudmlyb25tZW50IChodHRwOi8vd3d3LmNoZW1jb21wLmNvbS8sIGl0 IGhhcyBhbHNvIHZlcnkgZ29vZCBkZXNjcmlwdG9ycykuPC9mb250Pg0KPGJyPg0KPGJyPjxmb250 IHNpemU9MiBmYWNlPSJzYW5zLXNlcmlmIj40KSBJZiB5b3VyIHBhcnNpbW9uaXVzIGxpbmVhciBt b2RlbA0KaXMgbm90IGdvb2QgZW5vdWdoLCB5b3UgbWF5IGNoZWNrIGxpbmVhcml0eSBhbmQvb3Ig dXNlIG11Y2ggbW9yZSBkZXNjcmlwdG9ycy48L2ZvbnQ+DQo8YnI+DQo8YnI+DQo8YnI+PGZvbnQg c2l6ZT0yIGZhY2U9InNhbnMtc2VyaWYiPkhhdmUgYSBnb29kIHRpbWUhPC9mb250Pg0KPGJyPg0K PGJyPg0KPGJyPjxmb250IHNpemU9MiBmYWNlPSJzYW5zLXNlcmlmIj5Eci4gQW5kcsOhcyBQw6l0 ZXIgQm9yb3N5PGJyPg0KU2NpZW50aWZpYyBNb2RlbGxpbmcgRXhwZXJ0PGJyPg0KPGJyPg0KRnJh Z3JhbmNlIFJlc2VhcmNoPGJyPg0KR2l2YXVkYW4gU2Nod2VpeiBBRyAmbmJzcDstICZuYnNwO1Vl YmVybGFuZHN0cmFzc2UgMTM4ICZuYnNwOy0gJm5ic3A7Q0gtODYwMA0KJm5ic3A7LSAmbmJzcDtE w7xiZW5kb3JmICZuYnNwOy0gJm5ic3A7U3dpdHplcmxhbmQ8YnI+DQpUOis0MS00NC04MjQgMjE2 NCAmbmJzcDstICZuYnNwO0Y6KzQxLTQ0LTgyNDI5MjYgJm5ic3A7ICZuYnNwOy0gJm5ic3A7aHR0 cDovL3d3dy5naXZhdWRhbi5jb208L2ZvbnQ+DQo8YnI+DQo8YnI+DQo8YnI+PHR0Pjxmb250IHNp emU9Mj48YnI+DQomZ3Q7IDxicj4NCiZndDsgLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0tLTxi cj4NCiZndDsgJmd0OyBGcm9tOiAmcXVvdDtTYW5nZWV0aGEgU3VicmFtYW5pYW0gc3Jkc2hpZ2Vs bGEgLi4gZ21haWwuY29tJnF1b3Q7DQombHQ7b3duZXItPGJyPg0KJmd0OyBjaGVtaXN0cnldPVtj Y2wubmV0Jmd0Ozxicj4NCiZndDsgVG86ICZxdW90O0lzYWFjIEIuICZuYnNwO0JlcnN1a2VyJnF1 b3Q7ICZsdDtiZXJzdWtlcl09W21haWwuY20udXRleGFzLmVkdSZndDs8YnI+DQomZ3Q7IFNlbnQ6 IFdlZG5lc2RheSwgU2VwdGVtYmVyIDksIDIwMDkgNzo0NjozNSBBTSBHTVQgLTA2OjAwIFVTL0Nh bmFkYQ0KQ2VudHJhbDxicj4NCiZndDsgU3ViamVjdDogQ0NMOiBNb2xlY3VsYXIgZGVzY3JpcHRv ciBzZWxlY3Rpb248YnI+DQomZ3Q7IDxicj4NCiZndDsgPGJyPg0KJmd0OyBTZW50IHRvIENDTCBi eTogJnF1b3Q7U2FuZ2VldGhhICZuYnNwO1N1YnJhbWFuaWFtJnF1b3Q7IFtzcmRzaGlnZWxsYV0s W2dtYWlsLmNvbV08YnI+DQomZ3Q7IEhlbGxvIGV2ZXJ5b25lLDxicj4NCiZndDsgPGJyPg0KJmd0 OyBJIGhhdmUgYSBxdWVyeSBhbmQgd291bGQgYmUgZ2xhZCB0byBoZWFyIGFsbCB5b3VyIHN1Z2dl c3Rpb25zLjxicj4NCiZndDsgPGJyPg0KJmd0OyBXaGlsZSBidWlsZGluZyBhIG1vZGVsIGZvciBh IHNldCBvZiBjb21wb3VuZHMsIGhvdyBkb2VzIG9uZSBtYWtlIHRoZTxicj4NCiZndDsgY2hvaWNl IG9mIG1vbGVjdWxhciBkZXNjcmlwdG9ycy4gQXMgbW9yZSB0aGFuIHRob3VzYW5kcyBvZiB0aGVt IGNhbg0KPGJyPg0KJmd0OyBiZSBjYWxjdWxhdGVkLCBpdCBsb29rcyBnb29kIHRvIHVzZSBhbGwg b2YgdGhlbS4gQnV0IHdpbGwgaXQgbWFrZQ0Kc2Vuc2U/PGJyPg0KJmd0OyA8YnI+DQomZ3Q7IFdo YXQgbWV0aG9kcy9wYWNrYWdlcyBjYW4gYmUgdXNlZCBpbiBkZWNpZGluZyB0aGUgcmlnaHQgY2hv aWNlIG9mDQpkZXNjcmlwdG9ycz88YnI+DQomZ3Q7IENhbiB5b3UgcGxlYXNlIHBvc3QgcmVsZXZh bnQgc29mdHdhcmUgdGhhdCBjYW4gYmUgdXNlZCBoZXJlLi48YnI+DQomZ3Q7IDxicj4NCiZndDsg VGhhbmtzPGJyPg0KJmd0OyBTYW5nZWV0aGEuaHR0cDovL3d3dy5jY2wubmV0L2NnaS1iaW4vY2Ns L3NlbmRfY2NsX21lc3NhZ2VodHRwOi8vd3d3Ljxicj4NCiZndDsgY2NsLm5ldC9jaGVtaXN0cnkv c3ViX3Vuc3ViLnNodG1saHR0cDovL3d3dy5jY2wubmV0L3NwYW1tZXJzLnR4dDxicj4NCiZndDsg PGJyPg0KJmd0OyA8YnI+DQo8L2ZvbnQ+PC90dD4NCg== --=_alternative 0031C78FC125762D_=-- From owner-chemistry@ccl.net Thu Sep 10 09:11:00 2009 From: "andras.borosy::givaudan.com" To: CCL Subject: CCL: Modelling in chemical industry Message-Id: <-40207-090910090331-13641-LnQr5Pgk1qXtSiQI0Y5rVQ---server.ccl.net> X-Original-From: andras.borosy=givaudan.com Content-Type: multipart/alternative; boundary="=_alternative 0047B5CFC125762D_=" Date: Thu, 10 Sep 2009 15:03:15 +0200 MIME-Version: 1.0 Sent to CCL by: andras.borosy(_)givaudan.com This is a multipart message in MIME format. --=_alternative 0047B5CFC125762D_= Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Dear Colleagues, I am looking for a concise summary (survey, article or presentation) which = shows why mathematical modelling is useful for the chemical industry. Many thank, Dr. Andr=E1s P=E9ter Borosy Scientific Modelling Expert Fragrance Research Givaudan Schweiz AG - Ueberlandstrasse 138 - CH-8600 - D=FCbendorf -= =20 Switzerland T:+41-44-824 2164 - F:+41-44-8242926 - http://www.givaudan.com --=_alternative 0047B5CFC125762D_= Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable
Dear Colleagues,

I am looking for a concise summary (= survey, article or presentation) which shows why mathematical modelling is useful for the chemical industry.

Many thank,

Dr. Andr=E1s P=E9ter Borosy
Scientific Modelling Expert

Fragrance Research
Givaudan Schweiz AG  -  Ueberlandstrasse 138  -  CH-8600  -  D=FCbendorf  -  Switzerland
T:+41-44-824 2164  -  F:+41-44-8242926    -  http:= //www.givaudan.com
--=_alternative 0047B5CFC125762D_=-- From owner-chemistry@ccl.net Thu Sep 10 12:29:01 2009 From: "Chunhui Li baotogo2004#,#gmail.com" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40208-090910122729-5311-rrEaF90sNQl/QqND2fmxUg]![server.ccl.net> X-Original-From: "Chunhui Li" Date: Thu, 10 Sep 2009 12:27:25 -0400 Sent to CCL by: "Chunhui Li" [baotogo2004_._gmail.com] Hi Dear ALl, I want to generate smile strings for more than 100 chemicals. All I have is the CAS number for each chemical. Is there any tools I can use to do this automatically? Thanks in advance! From owner-chemistry@ccl.net Thu Sep 10 13:37:01 2009 From: "Rajarshi Guha rajarshi.guha+*+gmail.com" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40209-090910132958-1377-NWTgPPzPKqJwqAcZk7bN3g=server.ccl.net> X-Original-From: Rajarshi Guha Content-Type: multipart/alternative; boundary=001636284fd87d16df04733c8bbc Date: Thu, 10 Sep 2009 13:29:42 -0400 MIME-Version: 1.0 Sent to CCL by: Rajarshi Guha [rajarshi.guha(~)gmail.com] --001636284fd87d16df04733c8bbc Content-Type: text/plain; charset=ISO-8859-1 On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li baotogo2004#,#gmail.com < owner-chemistry~~ccl.net> wrote: > > Sent to CCL by: "Chunhui Li" [baotogo2004_._gmail.com] > Hi Dear ALl, > > I want to generate smile strings for more than 100 chemicals. All I have is > the CAS number for each chemical. Is there any tools I can use to do this > automatically? > One way to get it is to use the PubChem synonym tables - this is obviously not a comprehensive solution as it doesn't contain the entire CAS registry. There's a simple REST interface (based on a mirror of PubChem at IU). For example, given a CAS for aspirin visit http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2smi/69-46-5 and you get back the SMILES for it. In general, replace the CAS at the end of the URL to whatever you want -- Rajarshi Guha NIH Chemical Genomics Center --001636284fd87d16df04733c8bbc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Thu, Sep 10, 2009 at 12:27 PM, Chunhu= i Li baotogo2004#,#gmail.com <owner-chemistry~~ccl.net= > wrote:

Sent to CCL by: "Chunhui =A0Li" [baotogo2004_._gmail.com]
Hi Dear ALl,

I want to generate smile strings for more than 100 chemicals. All I have is= the CAS number for each chemical. Is there any tools I can use to do this = automatically?

One way to get it is to use the Pub= Chem synonym tables - this is obviously not a comprehensive solution as it = doesn't contain the entire CAS registry.

There's a simple REST interface (based on a mirror of PubChem at IU= ). For example, given a CAS for aspirin visit

http://to= posome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2smi/69-46-5

and you get back the SMILES for it. In general, replace the CAS at the = end of the URL to whatever you want




--
Rajarshi Guha
NIH Chemical Genomics Center
--001636284fd87d16df04733c8bbc-- From owner-chemistry@ccl.net Thu Sep 10 14:52:00 2009 From: "Christian Pilger christian.pilger]~[gmx.net" To: CCL Subject: CCL: exhaustive tautomer generation Message-Id: <-40210-090910144957-5940-OEzXQysQi+6tTPYIvgvH5Q()server.ccl.net> X-Original-From: "Christian Pilger" Date: Thu, 10 Sep 2009 14:49:54 -0400 Sent to CCL by: "Christian Pilger" [christian.pilger[a]gmx.net] Dear CCLers, For one of my projects I need to generate all possible tautomers for a large set of structures. Which software packages are capable of performing this task? Any hints are very welcome. Cheers, Christian From owner-chemistry@ccl.net Thu Sep 10 16:16:00 2009 From: "Steve Williams willsd|appstate.edu" To: CCL Subject: CCL:G: strange spin annihilation Message-Id: <-40211-090910160012-24497-chKzFRpKWzla0G5wkLXr3g---server.ccl.net> X-Original-From: Steve Williams Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1; format=flowed Date: Thu, 10 Sep 2009 15:17:04 -0400 MIME-Version: 1.0 Sent to CCL by: Steve Williams [willsd-x-appstate.edu] As part of a larger project I think I kneed to understand the electronic properties of a hypothetical molecule: C5 in pentagonal (D5h) symmetry. I know that the linear form of this molecule is known, and a C2V cyclic structure has been investigated as well, but I want the pentagonal symmetry ring form. After not understanding some more complex calculations I decided to look at the basics: Hartree-Fock with sto-3g basis. Stable calculations in G03 indicate that the rhf wavefunction is unstable with respect to a uhf wavefunction. This input: # hf/sto-3g pop=(nbo,savenlmo) scf=tight stable=opt test zmat for D5h C5 structure rc is 1.2073 at uhf 6-31+g* singlet level 0 1 X 0.00000000 0.00000000 0.00000000 X 0.00000000 0.00000000 1.00000000 C 1.20730000 0.00000000 0.00000000 C 0.37307622 -1.14821053 0.00000000 C -0.97672622 -0.70963314 0.00000000 C -0.97672622 0.70963314 0.00000000 C 0.37307622 1.14821053 0.00000000 gets to the stability calculation (let's not worry about nbo here) finds an instability and optimizes the wavefunction with uhf. Near the end of this calculation there is the following output: SCF Done: E(UHF) = -186.598320132 A.U. after 30 cycles Convg = 0.5189D-08 -V/T = 1.9996 S**2 = 3.6838 Annihilation of the first spin contaminant: S**2 before annihilation 3.6838, after 12.0993 I have never before seen a case where removing a spin contaminant causes the spin to increase! What can this mean, if anything? There are many Lewis structures one can draw for this geometry, including things like two triple bonds and a carbene; 5 double bonds, 5 carbenes, four double bonds and a biradical..... Is there any way to interpret the spin value to get some hint about the bonding (assuming such a strange spin value CAN be interpreted)? Thanks for any insight you may have and be willing to share, Steve Williams From owner-chemistry@ccl.net Thu Sep 10 16:53:00 2009 From: "Ronald Cook cookrl*o*tda.com" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40212-090910161906-28388-mJnfRCF6sIefoWrGDEdrMQ]~[server.ccl.net> X-Original-From: "Ronald Cook" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii" Date: Thu, 10 Sep 2009 12:50:38 -0600 MIME-Version: 1.0 Sent to CCL by: "Ronald Cook" [cookrl~~tda.com] Hi Go to www.chemspider.com. Enter your CAS number into the search area and then the program returns the smiles code in addition to other information about the compounds Ron Ronald Cook Principal Scientist TDA Research, Inc. cookrl^tda.com 303-940-2302 -----Original Message----- > From: owner-chemistry+cookrl==tda.com^ccl.net [mailto:owner-chemistry+cookrl==tda.com^ccl.net] On Behalf Of Chunhui Li baotogo2004#,#gmail.com Sent: Thursday, September 10, 2009 10:27 AM To: Cook, Ronald L Subject: CCL: CAS number to Smile String Sent to CCL by: "Chunhui Li" [baotogo2004_._gmail.com] Hi Dear ALl, I want to generate smile strings for more than 100 chemicals. All I have is the CAS number for each chemical. Is there any tools I can use to do this automatically? Thanks in advance!http://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt From owner-chemistry@ccl.net Thu Sep 10 17:26:01 2009 From: "Kelly smilin_iis(-)yahoo.com" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40213-090910163717-4457-HMXrxJ2fZMORFvJNPL+Uyw ~ server.ccl.net> X-Original-From: Kelly Content-Type: multipart/alternative; boundary="0-720604125-1252615019=:53735" Date: Thu, 10 Sep 2009 13:36:59 -0700 (PDT) MIME-Version: 1.0 Sent to CCL by: Kelly [smilin_iis{}yahoo.com] --0-720604125-1252615019=:53735 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Mathematica has some=A0new feature available in version 7 - =0ABut how comp= rehensive it is i don't know!=0A=0A=0AFrom the Mathematica help files:=0A= =09* Chemicals can be specified by their common names such as "Water" or "A= ceticAcid", registry numbers such as "CID962" or "CAS732-18-5", IUPAC-like = names such as "2Methylpropane" or structure strings.=0A=09* ChemicalData[] = gives a list of all available chemicals.=0A=09* ChemicalData["Properties"] = gives a list of all properties available for chemicals. =0A=A0=0ASo if your= compunds are=A0common enough to be found there, you would type:=A0=0A=0ACh= emicalData["CAS22839-47-0", "SMILES"]=0A=0Ayields:=0A=A0=0A=0A=0ASo to auto= mate it, you would make a list of the CAS=A0numbers, and then apply the fun= ction to the list and capture the answer in a new list, which you could the= n manipulate however you like.=0A=0A=0ALet me know if it works=A0- i tend t= o think of Mathematica's databases as frivolous.=0A=0AIt might restore my f= aith to see somebody actually use them!=0A=A0=0A=0A=0A-Kelly=0A=A0=0A"Most = folks are about as happy as they make up their minds to be."=A0 - Abraham L= incoln=0A=0A=0A=0A=0A=0A________________________________=0AFrom: Rajarshi G= uha rajarshi.guha+*+gmail.com =0ATo: "Theel, Kelly= " =0ASent: Thursday, September 10, 2009 10:2= 9:42 AM=0ASubject: CCL: CAS number to Smile String=0A=0A=0A=0A=0AOn Thu, Se= p 10, 2009 at 12:27 PM, Chunhui Li baotogo2004#,#gmail.com wrote:=0A=0A=0A>Sent to CCL by: "Chunhui =A0Li" [baotogo2004_= ._gmail.com]=0A>Hi Dear ALl,=0A>=0A>I want to generate smile strings for mo= re than 100 chemicals. All I have is the CAS number for each chemical. Is t= here any tools I can use to do this automatically?=0A>=0A=0AOne way to get = it is to use the PubChem synonym tables - this is obviously not a comprehen= sive solution as it doesn't contain the entire CAS registry.=0A=0AThere's a= simple REST interface (based on a mirror of PubChem at IU). For example, g= iven a CAS for aspirin visit=0A=0Ahttp://toposome.chemistry.drexel.edu/~rgu= ha/rest/db/pubchem/cas2smi/69-46-5=0A=0Aand you get back the SMILES for it.= In general, replace the CAS at the end of the URL to whatever you want=0A= =0A=0A=0A=0A-- =0ARajarshi Guha=0ANIH Chemical Genomics Center=0A --0-720604125-1252615019=:53735 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
=0A
=0A
Mathematica has some new feature ava= ilable in version 7 -
=0A
But how comprehensive it is i don't kno= w!
=0A
 
=0A
 
=0A
From the Mathematic= a help files:
=0A
    =0A
  • Chemicals can be specified b= y their common names such as "Water" or "AceticAcid", registry numbers such as "CID96= 2" or "CAS732-18-5", IUPAC-like names such a= s "2Methylpropane" or structure strings.
= =0A=0A=0A

 

=0A
So if your compunds are common = enough to be found there, you would type: 
=0A
 
= =0A
ChemicalData["CAS22839-47-0", "SMILES"]
=0A
 
= =0A
yields:
=0A
3D"" 
=0A
 
=0A
 
=0A
So to automate= it, you would make a list of the CAS numbers, and then apply the func= tion to the list and capture the answer in a new list, which you could then= manipulate however you like.
=0A
 
=0A
 
= =0A
Let me know if it works - i tend to think of Mathematica's dat= abases as frivolous.
=0A
 
=0A
It might restore my f= aith to see somebody actually use them!
 
=0A
=0A
=0A
 
=0A
-Kelly
=0A
 <= /DIV>
=0A
"Most folks are about as= happy as they make up their minds to be."  - Abraham Lincoln=
=0A
 
=0A

=0A

=0A
=0A
=0AFrom: Rajarshi Guha rajarshi.guha+*+gmail.com <= ;owner-chemistry_-_ccl.net>
To: "Theel, Kelly " <smilin_iis_-_yahoo.com>
Sent: Thursday, September 10, 2009 10= :29:42 AM
Subject: CCL: = CAS number to Smile String



=0A
On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li baotogo2004#,#gmail.com <owner-chemistry]=3D= [ccl.net> wrote:
=0A

Sent to CCL by: "Chunhui  Li" [baotogo2004_._gmail.com]
H= i Dear ALl,

I want to generate smile strings for more than 100 chemi= cals. All I have is the CAS number for each chemical. Is there any tools I = can use to do this automatically?
=0A

One way to ge= t it is to use the PubChem synonym tables - this is obviously not a compreh= ensive solution as it doesn't contain the entire CAS registry.

There= 's a simple REST interface (based on a mirror of PubChem at IU). For exampl= e, given a CAS for aspirin visit

http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2sm= i/69-46-5

and you get back the SMILES for it. In general, replac= e the CAS at the end of the URL to whatever you want



--
Rajarshi Guha
NIH Chemical Genomics Center
--0-720604125-1252615019=:53735-- From owner-chemistry@ccl.net Thu Sep 10 18:01:01 2009 From: "Wolf-D. Ihlenfeldt wdi::xemistry.com" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40214-090910152054-5448-9T+6zVdGjObpWNsfrNtXYg!=!server.ccl.net> X-Original-From: "Wolf-D. Ihlenfeldt" Content-Language: en-us Content-Type: multipart/alternative; boundary="----=_NextPart_000_00AB_01CA325C.87812BD0" Date: Thu, 10 Sep 2009 21:20:31 +0200 MIME-Version: 1.0 Sent to CCL by: "Wolf-D. Ihlenfeldt" [wdi=-=xemistry.com] This is a multi-part message in MIME format. ------=_NextPart_000_00AB_01CA325C.87812BD0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable =20 Dear Chunhui, =20 In case you do not want to type in a hundred CAS numbers on a Web = interface, you can use the Cactvs toolkit (free academic versions at www.xemistry.com/academic) and write a tiny script: =20 molfile loop =93myfile.cas=94 eh { echo [ens get $eh E_SMILES] } =20 Internally, this tries both the REST service at http://cactus.nci.nih.gov/cgi-bin/lookup/search (developed by Markus Sitzmann) and PubChem (the up-to-date original, not a mirror, via NCBI eutils) =20 myfile.cas is a text file with CAS numbers, one per line. Bulletproofing = the script for CAS numbers which cannot be resolved etc. is left as an = exercise to the user =96 please consult the extensive documentation of the = toolkit. =20 W. D. Ihlenfeldt Xemistry GmbH wdi-#-xemistry.com --- xemistry gmbh =96 Gesch=E4ftsf=FChrer/Managing Director: Dr. W. D. = Ihlenfeldt Address: Auf den Stieden 8, D-35094 Lahntal, Germany HR Marburg B4713 : Ust/VAT ID DE215316329 : DUNS 34-400-1719=20 > From: owner-chemistry+wdi=3D=3Dxemistry.com-#-ccl.net [mailto:owner-chemistry+wdi=3D=3Dxemistry.com-#-ccl.net] On Behalf Of = Rajarshi Guha rajarshi.guha+*+gmail.com Sent: Thursday, September 10, 2009 7:30 PM To: Ihlenfeldt, Wolf D Subject: CCL: CAS number to Smile String =20 =20 On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li baotogo2004#,#gmail.com wrote: Sent to CCL by: "Chunhui Li" [baotogo2004_._gmail.com] Hi Dear ALl, I want to generate smile strings for more than 100 chemicals. All I have = is the CAS number for each chemical. Is there any tools I can use to do = this automatically? One way to get it is to use the PubChem synonym tables - this is = obviously not a comprehensive solution as it doesn't contain the entire CAS = registry. There's a simple REST interface (based on a mirror of PubChem at IU). = For example, given a CAS for aspirin visit http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2smi/69-46= -5 and you get back the SMILES for it. In general, replace the CAS at the = end of the URL to whatever you want --=20 Rajarshi Guha NIH Chemical Genomics Center ------=_NextPart_000_00AB_01CA325C.87812BD0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

 

Dear Chunhui,

 

In case you do not want to type in a hundred CAS numbers = on a Web interface, you can use the Cactvs toolkit (free academic versions at = www.xemistry.com/academic) = and write a tiny script:

 

molfile loop “myfile.cas” eh = {

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 echo [ens = get $eh E_SMILES]

}

 

Internally, this tries both =A0the REST service at http://cactus.nc= i.nih.gov/cgi-bin/lookup/search (developed by Markus Sitzmann) and PubChem (the up-to-date original, not = a mirror, via NCBI eutils)

 

myfile.cas is a text file with CAS numbers, one per line. Bulletproofing the script for CAS numbers which cannot be resolved etc. = is left as an exercise to the user – please consult the extensive = documentation of the toolkit.

 

W. D. Ihlenfeldt
Xemistry GmbH
wdi-#-xemistry.com
---
xemistry gmbh – Gesch=E4ftsf=FChrer/Managing Director: Dr. W. D. = Ihlenfeldt
Address: Auf den Stieden 8, D-35094 Lahntal, Germany
HR Marburg B4713 : Ust/VAT ID DE215316329 : DUNS 34-400-1719

From:= owner-chemistry+wdi=3D=3Dxemistry.com-#-ccl.net [mailto:owner-chemistry+wdi=3D=3Dxemistry.com-#-ccl.net] On Behalf Of = Rajarshi Guha rajarshi.guha+*+gmail.com
Sent: Thursday, September 10, 2009 7:30 PM
To: Ihlenfeldt, Wolf D
Subject: CCL: CAS number to Smile String

 

 

On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li = baotogo2004#,#gmail.com <owner-chemistry]=3Dcl.net&g= t; wrote:


Sent to CCL by: "Chunhui  Li" [baotogo2004_._gmail.com]
Hi Dear ALl,

I want to generate smile strings for more than 100 chemicals. All I have = is the CAS number for each chemical. Is there any tools I can use to do this automatically?


One way to get it is to use the PubChem synonym tables - this is = obviously not a comprehensive solution as it doesn't contain the entire CAS = registry.

There's a simple REST interface (based on a mirror of PubChem at IU). = For example, given a CAS for aspirin visit

http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/c= as2smi/69-46-5

and you get back the SMILES for it. In general, replace the CAS at the = end of the URL to whatever you want




--
Rajarshi Guha
NIH Chemical Genomics Center

------=_NextPart_000_00AB_01CA325C.87812BD0-- From owner-chemistry@ccl.net Thu Sep 10 18:35:00 2009 From: "Markus Sitzmann sitzmann---helix.nih.gov" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40215-090910141426-23302-kJWn53U4EtQKeaOM5hrX0A]~[server.ccl.net> X-Original-From: Markus Sitzmann Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1; format=flowed Date: Thu, 10 Sep 2009 13:40:32 -0400 MIME-Version: 1.0 Sent to CCL by: Markus Sitzmann [sitzmann**helix.nih.gov] Chunhui, you might try our NCI/CADD Chemical Identifier Resolver at http://cactus.nci.nih.gov/chemical/structure If the service knows the CAS number converting the CAS number to SMILES string is just: http://cactus.nci.nih.gov/chemical/structure/64-17-5/smiles or, converting it to SDF file: http://cactus.nci.nih.gov/chemical/structure/64-17-5/file?format=sdf Markus Chunhui Li baotogo2004#,#gmail.com wrote: > Sent to CCL by: "Chunhui Li" [baotogo2004_._gmail.com] > Hi Dear ALl, > > I want to generate smile strings for more than 100 chemicals. All I have is the CAS number for each chemical. Is there any tools I can use to do this automatically? > > Thanks in advance!> > -- Markus Sitzmann, Ph.D. Laboratory of Medicinal Chemistry Center for Cancer Research National Cancer Institute National Institutes of Health 376 Boyles St Frederick, MD 21702, USA 301-846-5974 (office) 301-846-6033 (fax) sitzmann(!)helix.nih.gov http://www.linkedin.com/pub/1/7b8/342 http://www.xing.com/profile/Markus_Sitzmann From owner-chemistry@ccl.net Thu Sep 10 19:35:01 2009 From: "Serguei Patchkovskii ps _ ned.sims.nrc.ca" To: CCL Subject: CCL:G: strange spin annihilation Message-Id: <-40216-090910182338-20306-PQFb5la+pQw498Z44gXhmg_-_server.ccl.net> X-Original-From: Serguei Patchkovskii Content-Type: TEXT/PLAIN; charset=US-ASCII Date: Thu, 10 Sep 2009 17:53:38 -0400 (EDT) MIME-Version: 1.0 Sent to CCL by: Serguei Patchkovskii [ps%ned.sims.nrc.ca] Steve, The calculation you are trying to do is meaningless. So the results you get are also meaningless - and you should not waste your time trying to interpret them in any way. The system you are looking at is intrinsically multi-configurational, and should be treated as such. One (very crude) way to try to understand it qualitatively is to treat it as a set of two weakly interacting Hueckel systems. One of the two corresponds to the familiar \pi system. It has five levels - a2", e1", and e2" (unless I misread the character tables) - and five electrons distributed in them. The ground electronic state of this subsystem would have doublet E1" symmetry[*] - degenerate both in spin and space. The second Hueckel system is formed by the five in-plane sp2-hybridized orbitals, pointing outwards (where hydrogens would have been in C5H5). The corresponding Hueckel MOs should be a1', e1', and e2' - so the ground state should be doublet E1' symmetry[+]. Now the interaction between these two subsystems should give you a set of six states - A1", A2", and E2", each coming in the singlet and the triplet flavour. Because the two Hueckel systems should be very weakly interacting because of the spatial separation, these states will appear very close to each other, with the triplet-E2" state likely being the lowest one[#]. Depending on the starting guess, your Sz=0 UHF solution (which is incapable of reproduing either the spatial or the spin symmetry of the correct answer) could converge to an arbitrarily crazy linear combination of these eight sublevels. To make things even more complicated, this model is excessively simplified already. For a more reasonable description, one would have to consider two more interacting Hueckel pairs: four electrons in the five pi orbitals interacting with six electrons in the sigma ring, and the symmetric combination of four electron in the sigma ring and six in the pi ring. The "true" ground-state solution may include contributions from these coonfigurations as well. The bottom line is that the minimally sensible way of looking at this system is to use a multi-configurational anzatz, with the absolute minimum of six electrons distributed over the four orbitals corresponding to the spatially-degenerate HOMOs of the two (pi and sigma) Hueckel subystems. It would also be possible to extract the state energies from a set of sufficiently cleverly chosen fractionally-occupied single-determinantal wavefunctions (the delta-SCF approach). Unless you are restricted to using single-determinantal approach - e.g. because you system is too large, and can only be handled by DFT - this is not something to be recommended. Have fun, Serguei [*] could be E2" if I mistread the tables [+] ditto [#] it is highly unlikely that the D5h structure is a local minimum on the PES for any of these states. On Thu, 10 Sep 2009, Steve Williams willsd|appstate.edu wrote: > > Sent to CCL by: Steve Williams [willsd-x-appstate.edu] > As part of a larger project I think I kneed to understand the electronic > properties of a hypothetical molecule: C5 in pentagonal (D5h) symmetry. I > know that the linear form of this molecule is known, and a C2V cyclic > structure has been investigated as well, but I want the pentagonal symmetry > ring form. After not understanding some more complex calculations I decided > to look at the basics: Hartree-Fock with sto-3g basis. Stable calculations in > G03 indicate that the rhf wavefunction is unstable with respect to a uhf > wavefunction. [rest deleted] --- Dr. Serguei Patchkovskii Tel: +1-(613)-990-0945 Fax: +1-(613)-947-2838 Skype: Serguei.Patchkovskii E-mail: Serguei.Patchkovskii^-^nrc.ca Coordinator of Modelling Software Theory and Computation Group Steacie Institute for Molecular Sciences National Research Council Canada Room 2011, 100 Sussex Drive Ottawa, Ontario K1A 0R6 Canada From owner-chemistry@ccl.net Thu Sep 10 20:10:00 2009 From: "Wolf-D.Ihlenfeldt wdi^^xemistry.com" To: CCL Subject: CCL: exhaustive tautomer generation Message-Id: <-40217-090910180539-3246-46SE5f8Q5QZTUXYuIMsitw .. server.ccl.net> X-Original-From: "Wolf-D.Ihlenfeldt" Content-Language: en-us Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="windows-1257" Date: Thu, 10 Sep 2009 22:51:19 +0200 MIME-Version: 1.0 Sent to CCL by: "Wolf-D.Ihlenfeldt" [wdi[*]xemistry.com] >=20 >=20 > Sent to CCL by: "Christian Pilger" [christian.pilger[a]gmx.net] > Dear CCLers, >=20 > For one of my projects I need to generate all possible tautomers for a > large set of structures. Which software packages are capable of > performing this task? Any hints are very welcome. Cactvs. See for example=20 The Impact of Tautomer Forms on Pharmacophore-Based Virtual Screening=86 Frank Oellien,=87 Jo=8Drg Cramer,=87 Carsten Beyer,=87,| Wolf-Dietrich = Ihlenfeldt,=A7 and Paul M. Selzer*,=87 >=20 > Cheers, >=20 > Christian >=20 >=20 >=20 > -=3D This is automatically added to each message by the mailing script = =3D- > To recover the email address of the author of the message, please > change>=20>=20>=20>=20>=20> Conferences: = http://server.ccl.net/chemistry/announcements/conferences/ >=20>=20>=20 From owner-chemistry@ccl.net Thu Sep 10 20:51:01 2009 From: "Antony Williams tony*chemspider.com" To: CCL Subject: CCL: CAS number to Smile String Message-Id: <-40218-090910204835-19963-lAYADWMKX57yRq13EUle9A^^^server.ccl.net> X-Original-From: "Antony Williams" Content-class: urn:content-classes:message Content-Type: multipart/related; boundary="----_=_NextPart_001_01CA3274.5E597208"; type="multipart/alternative" Date: Thu, 10 Sep 2009 20:11:12 -0400 MIME-Version: 1.0 Sent to CCL by: "Antony Williams" [tony[]chemspider.com] This is a multi-part message in MIME format. ------_=_NextPart_001_01CA3274.5E597208 Content-Type: multipart/alternative; boundary="----_=_NextPart_002_01CA3274.5E597208" ------_=_NextPart_002_01CA3274.5E597208 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable There are a series of ChemSPider web services to use if you wish to avail yourself of them. The services are listed online here:=20 http://www.chemspider.com/Search.asmx =20 In terms of internet-based quality there are significant issues when it comes to registry numbers. For example, take a simple element like Carbon. CAS's own website declares the CAS registry number as 7440-44-0: http://www.commonchemistry.org/search.aspx?terms=3Dcarbon =20 A search on PubChem will give two hits: methane and progesterone:=20 http://www.ncbi.nlm.nih.gov/sites/entrez?db=3Dpccompound&term=3D7440-44-0= =20 Some of these issues have proliferated into databases utilizing PubChem as the seed set so a search on the NCI database also gives methane:=20 http://cactus.nci.nih.gov/chemical/structure/7440-44-0/file?format=3Dsdf = =20 There is no easy way to curate and validate CAS registry numbers as it requires access to the CAS Registry to validate so only investigative work and diligence can improve this situation and the majority of databases are not resourced to curate the data. It is also not allowable to use SciFinder to validate Registry Numbers as stated here=20 http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Chemistry/CAS_va lidation=20 =20 Those databases with the resources and mission to curate do get it right and I recommend ChEBI as an example:=20 http://www.ebi.ac.uk/ebisearch/search.ebi?db=3DsmallMolecules&t=3D7440-44= -0 =20 Please note that many of the identifiers in ChemSPider, including CAS numbers, have been curated and validated by curators and by the public (crowdsourced curation). While the database is far from perfect we have put considerable effort into validating the data. =20 Feel free to contact us if you need help. =20 Antony Williams, VP Strategic Development ChemSpider, Royal Society of Chemistry US Office: 904 Tamaras Circle, Wake Forest, NC-27587 =20 Phone: +1 (919) 201-1516 Fax: +1 (919) 300-5321 =20 > From: owner-chemistry+wdi=3D=3Dxemistry.com-,-ccl.net [mailto:owner-chemistry+wdi=3D=3Dxemistry.com-,-ccl.net] On Behalf Of Rajarshi Guha rajarshi.guha+*+gmail.com Sent: Thursday, September 10, 2009 7:30 PM To: Ihlenfeldt, Wolf D=20 Subject: CCL: CAS number to Smile String =20 =20 On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li baotogo2004#,#gmail.com < owner-chemistry]=3Dcl.net> wrote: Sent to CCL by: "Chunhui Li" [baotogo2004_._gmail.com] Hi Dear ALl, I want to generate smile strings for more than 100 chemicals. All I have is the CAS number for each chemical. Is there any tools I can use to do this automatically? One way to get it is to use the PubChem synonym tables - this is obviously not a comprehensive solution as it doesn't contain the entire CAS registry. There's a simple REST interface (based on a mirror of PubChem at IU). For example, given a CAS for aspirin visit http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/cas2smi/69-4 6-5 and you get back the SMILES for it. In general, replace the CAS at the end of the URL to whatever you want --=20 Rajarshi Guha NIH Chemical Genomics Center ------_=_NextPart_002_01CA3274.5E597208 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

There are a series of ChemSPider web services to use if = you wish to avail yourself of them. The services are listed online here: http://www.chemspider.com/= Search.asmx

 

In terms of internet-based quality there are significant = issues when it comes to registry numbers. For example, take a simple element = like Carbon. CAS’s own website declares the CAS registry number as = 7440-44-0: http:/= /www.commonchemistry.org/search.aspx?terms=3Dcarbon=

 

A search on PubChem will give two hits: methane and = progesterone: http://www.ncbi.nlm.nih.gov/sites/entrez?db=3Dpccompound&am= p;term=3D7440-44-0

 

Some of these issues have proliferated into databases = utilizing PubChem as the seed set so a search on the NCI database also gives = methane: http://cactus.nci.nih.gov/chemical/structure/7440-44-0/file?form= at=3Dsdf

 

There is no easy way to curate and validate CAS registry = numbers as it requires access to the CAS Registry to validate so only = investigative work and diligence can improve this situation and the majority of = databases are not resourced to curate the data. It is also not allowable to use = SciFinder to validate Registry Numbers as stated here http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_= Chemistry/CAS_validation

 

Those databases with the resources and mission to curate = do get it right and I recommend ChEBI as an example: http://www.ebi.ac.uk/ebisearch/search.ebi?db=3DsmallMolec= ules&t=3D7440-44-0

 

Please note that many of the identifiers in ChemSPider, = including CAS numbers, have been curated and validated by curators and by the = public (crowdsourced curation). While the database is far from perfect we have put = considerable effort into validating the data.

 

Feel free to contact us if you need = help.

 

= Antony Williams, VP Strategic Development

=

ChemSpider, Royal Society = of Chemistry

US Office: 904 Tamaras Circle, Wake Forest, = NC-27587

 

Phone: +1 (919) 201-1516
Fax: +1 (919) 300-5321

 

From:= owner-chemistry+wdi=3D=3Dxemistry.com-,-ccl.net [mailto:owner-chemistry+wdi=3D=3Dxemistry.com-,-ccl.net] On Behalf Of = Rajarshi Guha rajarshi.guha+*+gmail.com
Sent: Thursday, September 10, 2009 7:30 PM
To: Ihlenfeldt, Wolf D
Subject: CCL: CAS number to Smile String

 

 

On Thu, Sep 10, 2009 at 12:27 PM, Chunhui Li = baotogo2004#,#gmail.com <owner-chemistry]=3Dcl.net&g= t; wrote:


Sent to CCL by: "Chunhui  Li" [baotogo2004_._gmail.com]
Hi Dear ALl,

I want to generate smile strings for more than 100 chemicals. All I have = is the CAS number for each chemical. Is there any tools I can use to do this = automatically?


One way to get it is to use the PubChem synonym tables - this is = obviously not a comprehensive solution as it doesn't contain the entire CAS = registry.

There's a simple REST interface (based on a mirror of PubChem at IU). = For example, given a CAS for aspirin visit

http://toposome.chemistry.drexel.edu/~rguha/rest/db/pubchem/c= as2smi/69-46-5

and you get back the SMILES for it. In general, replace the CAS at the = end of the URL to whatever you want




--
Rajarshi Guha
NIH Chemical Genomics Center

------_=_NextPart_002_01CA3274.5E597208-- ------_=_NextPart_001_01CA3274.5E597208 Content-Type: image/png; name="image001.png" Content-Transfer-Encoding: base64 Content-ID: Content-Description: image001.png Content-Location: image001.png iVBORw0KGgoAAAANSUhEUgAAAIYAAACGCAMAAAAvpwKjAAAAAXNSR0ICQMB9xQAAAANQTFRFAAAA p3o92gAAAAF0Uk5TAEDm2GYAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAAZdEVYdFNvZnR3YXJlAE1p Y3Jvc29mdCBPZmZpY2V/7TVxAAAAKElEQVR42u3BgQAAAADDoPlT3+AEVQEAAAAAAAAAAAAAAAAA AAAAwDNGqgAB7T8OeAAAAABJRU5ErkJggg== ------_=_NextPart_001_01CA3274.5E597208-- From owner-chemistry@ccl.net Thu Sep 10 21:25:01 2009 From: "Andrew Orry andy .. molsoft.com" To: CCL Subject: CCL: exhaustive tautomer generation Message-Id: <-40219-090910211703-8319-+m1ecpn6xlSAAH9JpGHqwA|-|server.ccl.net> X-Original-From: Andrew Orry Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1; format=flowed Date: Thu, 10 Sep 2009 18:13:03 -0700 MIME-Version: 1.0 Sent to CCL by: Andrew Orry [andy-x-molsoft.com] MolSoft's ICM-Chemist software can generate all possible tautomers interactively or in batch mode. See http://www.molsoft.com/icm-chemist.html for more information. Thanks, Andy -- Andrew Orry Ph.D. Senior Scientist MolSoft LLC 3366 North Torrey Pines Court Suite 300 La Jolla, CA 92037 U S A Phone: (858) 625-2000 (x108) Fax: (858) 625-2888 www.molsoft.com Latest ICM News: www.molsoft.com/news.html Christian Pilger christian.pilger]~[gmx.net wrote: > Sent to CCL by: "Christian Pilger" [christian.pilger[a]gmx.net] > Dear CCLers, > > For one of my projects I need to generate all possible tautomers for a large set of structures. Which software packages are capable of performing this task? Any hints are very welcome. > > Cheers, > > Christian> >