From DSMITH@uoft02.utoledo.edu  Sat Jun 26 06:21:56 1993
Date: Sat, 26 Jun 1993 11:21:56 -0500 (EST)
From: "DR. DOUGLAS A. SMITH, UNIVERSITY OF TOLEDO" <DSMITH@uoft02.utoledo.edu>
Subject: full disclosure of methods?
To: chemistry@ccl.net
Message-Id: <01GZTY7NYTFM000SZH@UOFT02.UTOLEDO.EDU>


Mark Thompson recently wrote, regarding the SAM1 topic currently under 
discussion:

>Let me address the more fundamental issue that this topic 
>brings forth.  I share Graham Hurst's concerns. One of the 
>basic tenets of good science is that of reproducibility,
>and independant verification.

This is, I think, universally true and accepted.  However, it is rarely
followed.  For example, there has been talk over the years that people
who use molecular mechanics for their research should publish the parameters
used for each study as part of the paper (or at least they should publish
the differences between their parameters and the "standard" parameters in
the set used, e.g. MM2, AMBER, etc.).  I think that this issue was raised
in a paper by Peter Kollman a few years ago.  This is a particular problem
when people use MM2, which has been parameterized by many, many people in
addition to Lou Allinger for special situations and molecules.  Another 
place this occurs is in programs such as MacroModel, where parameters of all
types and qualities are available through user set switches.  Similarly,
in other commercial codes such as Hyperchem (with which I have experience)
many parameters other than the "standard" Allinger MM2 parameters exist.
(I will not discuss other vendor's codes because my experience is much more
limited with them.)  A similar problem occurred a few years ago when the
MMX force field in PCMODEL was being developed and expanded.  In house 
testing showed us that some of the parameters, particularly for organometallic
species, were not giving reasonable results.  (I believe that the current
MMX force field is much improved, and do not mean to cast any doubts on it.
My apologies to Kevin Gilbert.)

>If the results of a new method are published without 
>sufficiently describing the method to fulfill the above 
>criteria, then I personally could not take the results
>seriously.  Furthermore, I would never have recommended
>such work for publication.

While this is a real problem and a good argument for standardization, it is
in my opinion, a goal that is utopian and most likely not practical.  Part
of the problem is the codes and the proprietary nature of commercial
software.  Some of the problem is user naivete (i.e. the black box problem).
A question arises: is this the reason that results using commercial software
is so rarely published in most fields?  I almost never see modeling results
based on BioGraf, HyperChem, etc.  SYBYL results do appear, and so do
polymer modeling results from a wide variety of commercial codes, and even
MacroModel results (mostly from the academic community).  Or is the reason
that academics can't afford many commercial codes so don't use or publish
with them, while companies that purchase and use commercial codes keep 
their results in house and proprietary?

In addition, I do not agree that we should never recommend "such work" for
publication.  Often, as Andy Holder seems to be indicating, rapid 
communication of preliminary results with the promise of a more complete or
full disclosure of a method is very reasonable.  In the synthetic community
this is common -- look at how little experimental detail is provided in a
typical J. Am. Chem. Soc. or Tet. Letters communication.  

>I feel very strongly that when a new method is developed
>and implemented that it must pass the peer review process
>to gain legitimacy in the scientific community, regardless
>of whether most other scientists care to reimplement that
>method or not.

Again, in the specific case of SAM1, the method is publically available in
a Ph.D. dissertation from 1990 (if I remember Andy's posting correctly).
Besides, who ever said we had to reveal all our secrets and make them
readily available and accessible?  When software copyrights and patents 
really provide adequate protection, maybe I will agree with that attitude.

>Proprietary methods are fine, as long as it is openly
>known that they are proprietary.  Results of proprietary
>methods do not belong in the open scientific literature.

Then where do they belong?  Comparison of these results with "standard" and
commonly available "academic" results is healthy and stimulating.  And, not
to tweak Mark Thompson, who freely distributes Argus, what about Gaussian?
Many people no longer have access to G92 source code due to recent and
commercially driven changes.  Does that mean we cannot accept their results
in the open literature -- or must we decide based on whether or not their
results are from previously available pieces of the code rather than from
newer, proprietary sections?  Or what about the difference between someone
in industry who paid for the source code for MacroModel as compared to
the academic, such as myself, who only gets binaries?  Are my results to
be less acceptible because I don't have the absolute method available?  Or
are the industrial results less acceptible because they can be the results
of tweaking the code?

There are many, many issues hidden in this beast.  The scientific community
is just realizing that this beast is a tiger and that the tiger may have
a tail.  We still need to locate and identify the tail, grab it, and hang
on while figuring out how to keep the tiger from biting us.  My own conclusion
is that keeping the tiger in a dark cage called censorship would be the
worst thing we could do, and limiting access to the scientific literature
because someone's results came from what we thought might be a tiger but
had not proven to be one is not the best course of action.

Doug

Douglas A. Smith
Assistant Professor of Chemistry
The University of Toledo
Toledo, OH  43606-3390

voice    419-537-2116
fax      419-537-4033
email    dsmith@uoft02.utoledo.edu