From chemistry-request /at\www.ccl.net Wed Sep 23 03:46:26 1998 Received: from apollon.bender.co.at (apollon.bender.co.at [193.154.107.33]) by www.ccl.net (8.8.3/8.8.6/OSC/CCL 1.0) with ESMTP id DAA26058 Wed, 23 Sep 1998 03:46:22 -0400 (EDT) Received: from vieexch1.vie.at.bic (vieexch1 [148.190.11.2]) by apollon.bender.co.at (8.9.1/8.6.9) with ESMTP id JAA00204 for ; Wed, 23 Sep 1998 09:46:19 +0200 Received: by vieexch1.vie.at.bic with Internet Mail Service (5.5.1960.3) id ; Wed, 23 Sep 1998 09:39:59 +0200 Message-ID: <4195A3C6D39BD11189C30020AFFC17588F3BD9 # - at - # vieexch1.vie.at.bic> From: "LOEFFLER,DR,GERALD FEM BENAT" To: chemistry- at -www.ccl.net Subject: RE: CASP vs. The Tower of Babel Date: Wed, 23 Sep 1998 09:39:58 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.1960.3) Content-Type: text/plain; charset="iso-8859-1" Hi! So why is there no collaboration on the code-level between computational scientists? I don't want to over-simplify the problem, but one important reason IMHO is the fact that traditionally computational scientists have been _lousy_ programmers. While the technology to make interoperable application components has been there for some time (I'm particularly thinking of OO technology), computational scientists have been too inexperienced in software engineering techniques to exploit this technology. As a concrete example: It would be almost trivial to design an OO framework for Energy Minimization (or Molecular Dynamics or Threading, ...) where the force-field would be just one component that you "plug" into the framework. The same is true for such algorithmic components as minimization strategies, potentials of mean force, integration algorithms, Monte Carlo moves, Genetic Algorithms, etc. But is the resulting code efficient? It's certainly a lot slower than FORTRAN! (For language reasons and because you loose opportunities for source-level optimizations if components have to be truly independent of each other.) But I would argue that firstly speed of execution does not matter at this level (we are not talking about production code here). And secondly, on the other hand, of what use is speed of execution if you can not get the functionality you want (interoperability of algorithms, in this case) with reasonable effort? But things _are_ getting better: MMTK springs to my mind, and also the OMG effort to standardize interfaces for biological sequence analysis (although this is not an academic effort). And of course there are programs which are not designed with OO techniques and consequent interoperability in mind, but which are simply well written (TINKER, for instance). (Regarding unwillingness to give away source code at all: With Java you can even give away the "object code" (byte code) (maybe after running it through an obfuscator) and it will work on any Java platform...) cheers, gerald |--------------------------------------------------------------------| Gerald Loeffler - Bioinformatics Scientist Boehringer Ingelheim R&D Vienna Email: Gerald.Loeffler;at;vienna.at Phone: +43 676 3289588 (and +43 1 80105 634) Fax: +43 1 80105 683 Smail: Bender+Co, Dr. Boehringer-Gasse 5-11, A-1121 Vienna, Austria > -----Original Message----- > From: Gabriel Berriz [SMTP:berriz-: at :-potato.harvard.edu] > Sent: Sunday, September 20, 1998 7:53 PM > To: chemistry[ AT ]www.ccl.net > Subject: CCL:CASP vs. The Tower of Babel > > > > I study the statistical mechanics of protein folding using minimal > computer models. I find that my specialty has a Tower of Babel > (Babble?) problem, and perhaps the same is true of the whole field of > computational protein studies, or even of all of computational > chemistry. It has to do with checking and building on the results of > others. I was an experimentalist in cellular immunology for a few > years before switching to my current field, and I recall that trading > reagents, libraries, strains, was quite common in that field. I often > unpacked little vials shipped in dry ice, and bearing some precious > mutant; typically, after some quick tests, I was up and running with > the new stuff. Not much was required from the source of the samples > (a couple of concentrations, buffers used, maybe some growth > conditions here and there...). > > *Nothing* like this happens in my current field. I'm not sure why, > but I have a few guesses. For one thing, programs are a pain to port > across systems (portability of code is not a criterion for > publication). More important, in most cases I don't want the program > just to use as a black box. On the contrary, my interest in the > program is usually in how it implements a model; I want not only to > reproduce published results, but also to tweak the conditions, and to > extend the experiments. This invariably requires that I understand > enough of the code to hack away in it, and here's where I hit the > biggest wall. It takes me too long to understand the code written by > my *labmates*, let alone that written by some unknown graduate student > 5 years ago half a world away. (Again, clarity of code is, for the > most part, not a criterion for publication). So, typically I conclude > that either I re-implement the idea from scratch, which is usually > something I can't afford, or else I drop the matter altogether. > (Incidentally, in the few cases I've tried to get source code from > other labs, I've received such unequivocal, resounding, unapologetic > refusals, that I must conclude my request was deemed to be bad > manners.) > > It is, in my opinion, a very serious problem; it reduces the field to > a collection of largely independent efforts, deprived of one of the > greatest strengths of the scientific method, namely, the ability to > test and build upon the work of others. I wonder if others feel > similarly. > > I think this frustrating situation was what ultimately gave rise to > the biennial structure prediction competition CASP (Critical > Assessment of techniques for protein Structure Prediction), in which > participants put their structure prediction programs through the fire > test of predicting some recently solved protein structures prior to > their publication. This skips over the problem of understanding the > programs and the models devised by others by focusing on "objective > results". Faced with this clear prize, the field has naturally > responded by a adopting an increasingly heuristic attitude: whatever > works, however ad hoc or poorly understood, throw it in there. If you > loose, no one will care, and if you hit the CASP jackpot, then > "there's no arguing with success!" > > Well, I guess that's *one* way to deal with our Tower of Babel > problem, but I wonder where this leaves the science... I'm relatively > new to this field, though, and I wonder what others with more > experience feel about these issues. > > Best wishes, > > Gabriel Berriz > Department of Chemistry and Chemical Biology > Harvard University > berriz -8 at 8- potato.harvard.edu > For best results, replace the word potato by chasma in my address. > > > --- > Administrivia: This message is automatically appended by the mail > exploder: > CHEMISTRY- at -www.ccl.net: Everybody | CHEMISTRY-REQUEST- at -www.ccl.net: > Coordinator > MAILSERV ^%at%^ www.ccl.net: HELP CHEMISTRY or HELP SEARCH | Gopher: > www.ccl.net 73 > Anon. ftp: www.ccl.net | CHEMISTRY-SEARCH {*at*} www.ccl.net -- archive > search > Web: http://www.ccl.net/chemistry.html > ---