From DSMITH@uoft02.utoledo.edu Sat Jun 26 06:21:56 1993 Date: Sat, 26 Jun 1993 11:21:56 -0500 (EST) From: "DR. DOUGLAS A. SMITH, UNIVERSITY OF TOLEDO" Subject: full disclosure of methods? To: chemistry@ccl.net Message-Id: <01GZTY7NYTFM000SZH@UOFT02.UTOLEDO.EDU> Mark Thompson recently wrote, regarding the SAM1 topic currently under discussion: >Let me address the more fundamental issue that this topic >brings forth. I share Graham Hurst's concerns. One of the >basic tenets of good science is that of reproducibility, >and independant verification. This is, I think, universally true and accepted. However, it is rarely followed. For example, there has been talk over the years that people who use molecular mechanics for their research should publish the parameters used for each study as part of the paper (or at least they should publish the differences between their parameters and the "standard" parameters in the set used, e.g. MM2, AMBER, etc.). I think that this issue was raised in a paper by Peter Kollman a few years ago. This is a particular problem when people use MM2, which has been parameterized by many, many people in addition to Lou Allinger for special situations and molecules. Another place this occurs is in programs such as MacroModel, where parameters of all types and qualities are available through user set switches. Similarly, in other commercial codes such as Hyperchem (with which I have experience) many parameters other than the "standard" Allinger MM2 parameters exist. (I will not discuss other vendor's codes because my experience is much more limited with them.) A similar problem occurred a few years ago when the MMX force field in PCMODEL was being developed and expanded. In house testing showed us that some of the parameters, particularly for organometallic species, were not giving reasonable results. (I believe that the current MMX force field is much improved, and do not mean to cast any doubts on it. My apologies to Kevin Gilbert.) >If the results of a new method are published without >sufficiently describing the method to fulfill the above >criteria, then I personally could not take the results >seriously. Furthermore, I would never have recommended >such work for publication. While this is a real problem and a good argument for standardization, it is in my opinion, a goal that is utopian and most likely not practical. Part of the problem is the codes and the proprietary nature of commercial software. Some of the problem is user naivete (i.e. the black box problem). A question arises: is this the reason that results using commercial software is so rarely published in most fields? I almost never see modeling results based on BioGraf, HyperChem, etc. SYBYL results do appear, and so do polymer modeling results from a wide variety of commercial codes, and even MacroModel results (mostly from the academic community). Or is the reason that academics can't afford many commercial codes so don't use or publish with them, while companies that purchase and use commercial codes keep their results in house and proprietary? In addition, I do not agree that we should never recommend "such work" for publication. Often, as Andy Holder seems to be indicating, rapid communication of preliminary results with the promise of a more complete or full disclosure of a method is very reasonable. In the synthetic community this is common -- look at how little experimental detail is provided in a typical J. Am. Chem. Soc. or Tet. Letters communication. >I feel very strongly that when a new method is developed >and implemented that it must pass the peer review process >to gain legitimacy in the scientific community, regardless >of whether most other scientists care to reimplement that >method or not. Again, in the specific case of SAM1, the method is publically available in a Ph.D. dissertation from 1990 (if I remember Andy's posting correctly). Besides, who ever said we had to reveal all our secrets and make them readily available and accessible? When software copyrights and patents really provide adequate protection, maybe I will agree with that attitude. >Proprietary methods are fine, as long as it is openly >known that they are proprietary. Results of proprietary >methods do not belong in the open scientific literature. Then where do they belong? Comparison of these results with "standard" and commonly available "academic" results is healthy and stimulating. And, not to tweak Mark Thompson, who freely distributes Argus, what about Gaussian? Many people no longer have access to G92 source code due to recent and commercially driven changes. Does that mean we cannot accept their results in the open literature -- or must we decide based on whether or not their results are from previously available pieces of the code rather than from newer, proprietary sections? Or what about the difference between someone in industry who paid for the source code for MacroModel as compared to the academic, such as myself, who only gets binaries? Are my results to be less acceptible because I don't have the absolute method available? Or are the industrial results less acceptible because they can be the results of tweaking the code? There are many, many issues hidden in this beast. The scientific community is just realizing that this beast is a tiger and that the tiger may have a tail. We still need to locate and identify the tail, grab it, and hang on while figuring out how to keep the tiger from biting us. My own conclusion is that keeping the tiger in a dark cage called censorship would be the worst thing we could do, and limiting access to the scientific literature because someone's results came from what we thought might be a tiger but had not proven to be one is not the best course of action. Doug Douglas A. Smith Assistant Professor of Chemistry The University of Toledo Toledo, OH 43606-3390 voice 419-537-2116 fax 419-537-4033 email dsmith@uoft02.utoledo.edu