full disclosure of methods?



 Mark Thompson recently wrote, regarding the SAM1 topic currently under
 discussion:
 >Let me address the more fundamental issue that this topic
 >brings forth.  I share Graham Hurst's concerns. One of the
 >basic tenets of good science is that of reproducibility,
 >and independant verification.
 This is, I think, universally true and accepted.  However, it is rarely
 followed.  For example, there has been talk over the years that people
 who use molecular mechanics for their research should publish the parameters
 used for each study as part of the paper (or at least they should publish
 the differences between their parameters and the "standard" parameters
 in
 the set used, e.g. MM2, AMBER, etc.).  I think that this issue was raised
 in a paper by Peter Kollman a few years ago.  This is a particular problem
 when people use MM2, which has been parameterized by many, many people in
 addition to Lou Allinger for special situations and molecules.  Another
 place this occurs is in programs such as MacroModel, where parameters of all
 types and qualities are available through user set switches.  Similarly,
 in other commercial codes such as Hyperchem (with which I have experience)
 many parameters other than the "standard" Allinger MM2 parameters
 exist.
 (I will not discuss other vendor's codes because my experience is much more
 limited with them.)  A similar problem occurred a few years ago when the
 MMX force field in PCMODEL was being developed and expanded.  In house
 testing showed us that some of the parameters, particularly for organometallic
 species, were not giving reasonable results.  (I believe that the current
 MMX force field is much improved, and do not mean to cast any doubts on it.
 My apologies to Kevin Gilbert.)
 >If the results of a new method are published without
 >sufficiently describing the method to fulfill the above
 >criteria, then I personally could not take the results
 >seriously.  Furthermore, I would never have recommended
 >such work for publication.
 While this is a real problem and a good argument for standardization, it is
 in my opinion, a goal that is utopian and most likely not practical.  Part
 of the problem is the codes and the proprietary nature of commercial
 software.  Some of the problem is user naivete (i.e. the black box problem).
 A question arises: is this the reason that results using commercial software
 is so rarely published in most fields?  I almost never see modeling results
 based on BioGraf, HyperChem, etc.  SYBYL results do appear, and so do
 polymer modeling results from a wide variety of commercial codes, and even
 MacroModel results (mostly from the academic community).  Or is the reason
 that academics can't afford many commercial codes so don't use or publish
 with them, while companies that purchase and use commercial codes keep
 their results in house and proprietary?
 In addition, I do not agree that we should never recommend "such work"
 for
 publication.  Often, as Andy Holder seems to be indicating, rapid
 communication of preliminary results with the promise of a more complete or
 full disclosure of a method is very reasonable.  In the synthetic community
 this is common -- look at how little experimental detail is provided in a
 typical J. Am. Chem. Soc. or Tet. Letters communication.
 >I feel very strongly that when a new method is developed
 >and implemented that it must pass the peer review process
 >to gain legitimacy in the scientific community, regardless
 >of whether most other scientists care to reimplement that
 >method or not.
 Again, in the specific case of SAM1, the method is publically available in
 a Ph.D. dissertation from 1990 (if I remember Andy's posting correctly).
 Besides, who ever said we had to reveal all our secrets and make them
 readily available and accessible?  When software copyrights and patents
 really provide adequate protection, maybe I will agree with that attitude.
 >Proprietary methods are fine, as long as it is openly
 >known that they are proprietary.  Results of proprietary
 >methods do not belong in the open scientific literature.
 Then where do they belong?  Comparison of these results with
 "standard" and
 commonly available "academic" results is healthy and stimulating.
 And, not
 to tweak Mark Thompson, who freely distributes Argus, what about Gaussian?
 Many people no longer have access to G92 source code due to recent and
 commercially driven changes.  Does that mean we cannot accept their results
 in the open literature -- or must we decide based on whether or not their
 results are from previously available pieces of the code rather than from
 newer, proprietary sections?  Or what about the difference between someone
 in industry who paid for the source code for MacroModel as compared to
 the academic, such as myself, who only gets binaries?  Are my results to
 be less acceptible because I don't have the absolute method available?  Or
 are the industrial results less acceptible because they can be the results
 of tweaking the code?
 There are many, many issues hidden in this beast.  The scientific community
 is just realizing that this beast is a tiger and that the tiger may have
 a tail.  We still need to locate and identify the tail, grab it, and hang
 on while figuring out how to keep the tiger from biting us.  My own conclusion
 is that keeping the tiger in a dark cage called censorship would be the
 worst thing we could do, and limiting access to the scientific literature
 because someone's results came from what we thought might be a tiger but
 had not proven to be one is not the best course of action.
 Doug
 Douglas A. Smith
 Assistant Professor of Chemistry
 The University of Toledo
 Toledo, OH  43606-3390
 voice    419-537-2116
 fax      419-537-4033
 email    dsmith -AatT- uoft02.utoledo.edu