[Gepsc] responses to reviewers
karen.cranston at gmail.com
Wed Feb 15 11:26:43 CST 2017
(trying this again, now that I have finally realized that my primary email
address was silently failing on this list)
Here is a revised version of the response to reviewers. Given that we don't
have permission to share the reviews publicly, I didn't put this in the
I think the only two things left to do are:
1) Thumbs up or down on the attached response to reviewers
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
Dear Dr. Ouellette and reviewers,
Thank you for the very useful reviews of our manuscript ���Good Enough Practices in Scientific Computing���, PCOMPBIOL-D-16-01660. We have revised the manuscript in light of your comments. Below, we address your comments individually:
1/ "As a result, they take days or weeks to do things that could be
done in minutes or hours, are often unable to reproduce their own
work (much less the work of others), and have no idea how reliable
their computational results are" -- this will need to be rewritten,
from scratch. It 's not only offensive, but would not motivate any
reasonable reader to continue. Defies purpose of 'Perspective'.
We have re-written this section to emphasize the benefit to the researcher and change the tone.
2/ In some sections, Software, Collaboration etc. it would also be
good to maintain a more balanced view, so alternatives must also
be provided along with their merits (their disadvantages are plainly
obvious). Authors do not even attempt to give credit to any other
option, and in that regard the whole perspective has a feel of a
somewhat dogmatic position -- while this is not true, obviously.
In fact, as good example of discussing alternatives is already given
in the Data management section ("fold all data management...").
We have provided alternatives in the ���Tracking changes��� and ���Manuscripts��� sections, where we are discussing whole workflows. In ���Software��� and ���Collaboration���, we are instead listing individual small recommendations, where there is no obvious alternative overall workflow, and the only reasonable alternative to each recommendation is to simply not do it (e.g. not giving functions meaningful names, or not making licenses explicit).
3/ The section 'Software' might be renamed? Up to the authors.
Programming, scripting etc? Software is a very general term and
means too many other things not covered here. Another qualifier
must be that this section is NOT original as there are thousands
of books and millions of articles on the subject, these are simple
guidelines for the uninitiated. Lack fo any references is weird.
We retain ���Software���, as we meant this to be quite general and neither ���programming��� or ���scripting��� captures what we want. You are quite right about the novelty, though, and we have noted this explicitly and added several references.
4/ Contributions in the Collaboration section: make clear that
your underlying assumption refers to open source projects that
invite co-developers to contribute.
We aren���t assuming that all projects are public and open source,and have made some changes to reflect this. In the introduction to this paragraph, we say that collaborators might be new lab members (i.e. have access to otherwise private information), or a future you. We have also removed ���public��� from the to-do list item.
5/ Figure is fine, but readers will expect something more. Can
you think of another display, perhaps a visual guide of some
of your recommendations, in a specific section?
Figures are challenging for this type of paper, but we���ve added two more to help clarify certain points.
2b) Minor comments
1/ in abstract: "can and should consider adopting".
2/ "regardless of their current level of technical skill" - lose
'technical' -- there are many technical skills that the authors do
not possess. Better replace it with 'computational', or more
Replaced with ���computational���.
3/ Ditto for "collaborating with colleagues, organizing projects,
tracking work, and writing manuscripts" -- perhaps stay within
your expertise: organizing software projects, tracking versions,
reporting (not writing manuscripts -- others might know better).
We did not change this text in the Author Summary, as it is an accurate summary of the manuscript, which does describe collaboration, organization, tracking changes and writing manuscripts.
4/ page 2, line 9: should adopt. See Minor #1.
5/ "beyond emailing themselves a spreadsheet named" -- please
consider rephrasing. Abrasive, with no reason. Not funny either.
6/ page 3, line 59: "pertinent info," -- information.
Unchanged - we aren���t sure why pertinent is not ok, we are recommending information important to retrieving the data, not just any information.
7/ page 3, line 76: NAs -> the "NA" value.
Changed to simply ���NA���
8/ page 4, line 99: "re-use data preparation logic in the future",
Changed to ���data preparation steps���. Hope that is clearer.
9/ page 5, line 153: what about -h ? Help not recommended?
Added -h option.
10/ page 6, line 207: github, give URL for consistency with
others (provided later in the manuscript of course).
11/ page 10, line 399: dropbox, give URL -- as above.
12 / page 11, line 437: "less than a latte", not everyone will get
this, esp. people who have never ordered a latte.
Changed to ���less than $5���
13/ page 13, line 497: there's got to be some advantages!
Added ���allows co-authors to use familiar tools���.
14/ page 13, line 523: what are the disadvantages of Google
Docs or similar?
Added some text in summary.
15/ Need to include �� for Microsoft Word, I believe.
Added after all Microsoft instances.
Although there is some discussion on how and which data to back
up, scientific computing often deals with large data. Backing up
such data ���over the wire��� can be extremely slow. Some discussion
of incremental backups (eg rsync), storage options (eg AWS
Glacier), and data retention timelines may be warranted.
We have added some new text under ���Record all the steps used to process data��� and in the summary paragraphs of the Data Management section that pertains to big data. We have also included a new paragraph on backup / storage under ���Save the raw data���.
2. Expand ���3. Collaboration��� to include communication strategies
Many large, distributed collaborative projects suffer from poor
communication. It is critical to define how participants will
communicate early in a project. Will you use mailing lists? Who
will host them? Are they public? What about things like Slack?
Will there be teleconference calls? Who will be able to attend
these calls? Who will take minutes and where will they be stored?
Where will you keep documentation. Establishing the technical
architecture and cultural norms early will avoid serious pain
Added new item on communication, using many of the excellent recommendations suggested by the reviewer.
3. Add some discussion on data subdirectories (section 4.3)
A brief description of how to create and name data subdirectories
would be useful. These are often an afterthought and a complete
mess. Should they be managed by your workflow? Created
manually? Naming scheme? All points noted elsewhere in the
manuscript, this is just an opportunity to reinforce earlier
Added a short note about subdirectories in data directory under 4.3.
1. ���Scripts vs Programs��� and the bin/ vs src/ directory
I found the distinction contradictory. Personally, I place anything
that can be executed directly in bin/ and anything that requires
compilation in src/. Restricting bin/ to external scripts and
executables means that any directly interpreted scripts or .sql that
are a part of the project would be executed from src/. That���s
confusing and makes things inconsistent.
Good point. We���ve re-written some of this text, and added a blockquote about external scripts. The distinction we are trying to make is about consistency (and to a lesser extent, editability) rather than executability.
2. As an Hg refugee, I���m not certain that it���s friendlier than git,
particularly with the wealth of git tutorials now available. I���d
consider removing this recommendation.
Removed (noting that we still mention Mercurial as an option for version control).
3. Like Turner and Raj, I was surprised to see a recommendation for
the use of things like LaTex backed with VC for preparing a
manuscript. In this day of ubiquitous Internet coupled with the
power of collaborative editing via Google Docs, and distributed
authors from diverse backgrounds, I simply cannot imagine
writing a MS with LaTex.
Kept the option of text-based files under VC for manuscripts - we agree that the reviewer���s preferred option of online docs is the much simpler solution for many projects (which is why this is the first option presented), but given that we recommend version control in other parts of the manuscript, we leave the option for text-based manuscripts under VC. (We also note to the reviewer that LaTex + vcs is how we collaboratively wrote the ms).
There is also no mention of what happens when you move
from Google Docs. Some submission processes, for example,
require Word and generating conformant files from Google Docs
can be painful. When do you freeze writing? How do you handle
versioning at that point? The irony is not lost on me that I���ve
prepared this review in MS Word.
Noted that downloading is required in this workflow, at which point the disadvantages of desktop approaches all apply.
4. I commend the authors for embedding some humorous
anecdotes in this Perspective.
More information about the Gepsc