CCL: Science code manifesto

From: Mihaly Mezei <Mihaly.Mezei%mssm.edu>
Subject: CCL: Science code manifesto
Date: Wed, 19 Oct 2011 12:56:18 -0400

Sent to CCL by: Mihaly Mezei [Mihaly.Mezei_+_mssm.edu]
While I am putting most of the software (source code included) used in in my
papers on my website, I think REQUESTING the source code to be published/made
available is too strong a requirement. This would be equivalent to requesting
the circuit-board diagrams of the various instruments and/or the source code of
the data acquisition programs used in experimental work.
What IS important is the specification of the algorithms and the parameter set
used. Examining the source code (assuming that it is feasible to pore through
the thousands of lines of code) is only warranted if either fraud or error is
suspected.
Suspecting fraud without specific reason is just plain wrong. The same way that
we assume that the graphs in an experimental paper properly represent the raw
data, we should trust that the code represents the authors honest effort to
implement the algorithms in case. If there is solid reason to suspect fraud then
there are various avenues to pursue such investigation and that would certainly
involve requesting the code.
As I said before, the default assumption should be trust. However, what COULD be
requested as part of the protocol is the description of the various tests the
author performed to demonstrate that the code indeed implements the algorithm
described.
Suspicion of error can come form two sources: (a) something 'does not make
sense' and (b) I repeated the calculation with an other code and got different
results. In the first case, if you are a reviewer, you can ask the author for
clarification or, if you read the paper after publication, ask the author and if
the response is unsatisfactory, write a letter to the editor commenting on the
paper. One of these may lead to an acceptable explanation or the author
discovering an error in a code. In the second case, the the two groups involved
can work together in determining the source of the discrepancy - this is much
more efficient than one grou examining the other group's code. Such cooperative
error searc can be done without examining the source code of the other group by
comparing partial results - that could lead to pinpointing the source of the
discrepancy (I speak from experience here).
Mihaly Mezei
Department of Structural and Chemical Biology, Mount Sinai School of Medicine
Voice: (212) 659-5475 Fax: (212) 849-2456
WWW (MSSM home): http://www.mountsinai.org/Find%20A%20Faculty/profile.do?id=0000072500001497192632
WWW (Lab home - software, publications): http://inka.mssm.edu/~mezei
WWW (Department): http://atlas.physbio.mssm.edu