CCL:G: AW: Science code manifesto

From: "Georg Lefkidis" <lefkidis^^physik.uni-kl.de>
Subject: CCL:G: AW: Science code manifesto
Date: Thu, 20 Oct 2011 11:25:57 +0200

Sent to CCL by: "Georg Lefkidis" [lefkidis=physik.uni-kl.de]
Hello everyone,
I don't know if this was brought to the list's attention already, but there
is an additional component as well (much as most of us would not like to
admit it). Writing a source code often is the most time consuming part of
one's results, even if the mathematical analysis per se might be done
relatively quickly. This means, that once a code is there, the author(s)
would like to use it over and over again (perhaps for different but similar
systems) and of course *publish* more. The algorithmic implementation of a
mathematical formula is part of the process. So I believe that most
scientists-programmers would not feel very comfortable with sharing the
codes with *anonymous* referees, which at the end might even reject the
paper, and see that work appear elsewhere for other systems. Let's not
forget it is not only the systems, the analysis and the results, but also
the programming itself that is worth a publication. Everyone who ever
programmed a data-mining algorithm for Gaussian or Gamess output knows that
only too well.
Perhaps this is not of concern to great professors with huge groups and
meanwhile bug-free codes that have been around for decades but to a common
mortal (like myself) it is (since I've seen that happen, although luckily
not to me yet).
Another issue is the quality of different third-party programs used. I read
a couple of posts below, that it is (or might be) that the group's
reputation is decisive. In fact there is more to that: being able to
evaluate the quality of the results (by comparing to experiment or assessing
the quality of derived properties, selection rules, symmetries etc.) plays
also a very big role. For me a paper interpreting the importance of scaling
a parameter done with not the best code is at least equally important as the
best uncommented results produces out of the code only. So I see a potential
danger there if the code itself becomes more and more important. Besides,
good results will always get reproduced by other groups even with other
codes or methods (for instance theory vs. experiment etc.).
I am not saying I am for or against those two arguments, I just want to
mention them as possible issues which need thinking.
Best regards
Georg
-----Ursprüngliche Nachricht-----
Von: owner-chemistry+lefkidis==physik.uni-kl.de|,|ccl.net
[mailto:owner-chemistry+lefkidis==physik.uni-kl.de|,|ccl.net] Im Auftrag von
Andrew Dalke dalke a dalkescientific.com
Gesendet: Mittwoch, 19. Oktober 2011 11:39
An: Lefkidis, Georg
Betreff: CCL: Science code manifesto
Sent to CCL by: Andrew Dalke [dalke^^^dalkescientific.com] On Oct 18, 2011,
at 4:17 PM, Adrià Cereto Massagué adrian.cereto,+,gmail.com wrote:
> I don't think the manifesto is at odds with FSF. GPL'd software can be
sold at any price, but its source code must be available for those who own
the software at no further cost. And someone who has bought some GPL
software is allowed to redistribute it for free, so researchers using it for
a paper would be able to provide the software to reviewers and readers of
the paper at no cost.
Abstract: How much can the paper authors ask for access to the source code?
How much can the curators charge? What should the curator do if the curated
software contains a license violation?
If I write a paper which depends on software for its analysis, and others
should have access to the software as part of effective peer review, then
how much can I charge others to get access to the software? US $1 billion?
The FSF says I can charge as much as you want, and that freedom is one of
the core freedoms of free software.
The philosophy that others need access to my source code to provide good
peer review has the implicit assumption that I will provide the software at
a non-prohibitive cost.
There is clearly a tension between these two viewpoints. This manifesto says
nothing of what that cost might be, nor even that it might be an issue.
What should be the cost to get access to source code from the author, or
> from the curator? Does the curator get no-cost access to it as a condition
of publication? Doesn't any limit on cost curtail what the FSF says is my
freedom to charge as much as I want?
Remember, the FSF encourages software freedom. I argue that scientific
communication has overlapping but different goals.
Science communications needs to have a low cost so that many people can get
access to it. The FSF is only concerned about what happens *after* someone
gets access to software.
This is of course similar to (most) scientific papers. There the author
gives the curator the right to redistribute the paper without paying
royalties, and the curator can charge effectively any price for it. Most
paper publishers want to maximize revenue, and therefore set high but not
prohibitive prices. The software author may have other concerns.
Interestingly, the software curator takes on a more difficult challenge than
a paper curator. The authors of a paper (with a few exceptions usually
well-covered by fair use exceptions) are the only copyright holders of a
paper. More often though, the accompanying software has many more copyright
holders. That can lead to problems.
Consider the CDK chemistry toolkit. The package contains many copyright
holders, including those from third-party libraries which it incorporates. A
few years ago the CDK was in minor violation of the LGPL requirement of some
of those libraries.
(It omitted the credit required by those licenses.) This was quickly fixed
once pointed out. I can easily imagine cases where it can't be easily fixed.
The curator takes on the risk that someone else, who is a copyright holder
to the software in question but not a paper author, may challenge the right
of the curator to distribute the software. How does the curator resolve the
violation, especially if the original author doesn't want to be involved?
Does the curator remove the software in question?
If so, and if you insist that the software must be available in order to do
correct peer review, then should the corresponding paper also be withdrawn?
As I said before, these are solvable. I bring them up because I encourage
people to distribute their source code along with the paper, and to be aware
that it's not a simple, clear issue.
Andrew
dalke:+:dalkescientific.comhttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp-:-//www.ccl.net/chemistry/sub_unsub.shtmlhttp-:-//www.ccl.net/spammers.txt