Methods section driven reproducibility

A cornerstone of the scientific method has always been the ability to draw the same conclusions after the execution of different experiments. I would very much like to say that there is a consensus in the scientific community on how to call such a process but unfortunately that doesn’t seem to be the case. The terms “reproducibility”, “replicability” and “robustness” are often used interchangeably and different people might rank them differently depending on how they interpret them. Luckily, a recent paper cleverly proposed to stick to “reproducibility” to describe the process as a whole and to name its different flavors by adding a prefix. In short, Goodman et al. indicate the following kinds of reproducibility in science (the short summaries are mine):

  • Methods reproducibility: giving sufficient details about the experimental procedure and the processing of the data so that the “exact same” results can be obtained
  • Results reproducibility: carrying out an independent study with “very similar” procedure and obtaining “close enough” results as the original study
  • Inferential reproducibility: drawing “qualitatively” similar results from independent studies or a reanalysis of the same data

In the specific area of computational biology, the requirements to meet these three objectives can be more precisely defined:

  • Methods reproducibility: providing “machine code” that give exactly the same output given the same input
  • Results reproducibility: providing all the relevant details about the algorithms used so that they can be re-run/reimplemented and give quantitatively similar results on the same or different data
  • Inferential reproducibility: providing an interpretation of the results of an experiment so that it can be qualitatively compared with another study

It’s easy to see how the latter flavor of reproducibility is the most valuable, as getting to the same conclusions using different data or even completely different experimental strategies can sometime provide further support by itself. Needless to say that is also the one that requires the most work and resources to achieve.

Regarding methods reproducibility, it has become pretty fashionable in computational biology; many journals are explicitly requesting authors to deposit all computer code as supplementary material. The extreme case being providing either VMs or so called containers to ensure that the specific computing environment does not alter the final result, leading to perfect methods reproducibility. This is an important thing to aspire for, especially to avoid scientific fraud (or bona-fide errors), and many people have proposed technologies to make this relatively easy to achieve. Despite all this, I believe that in many cases the emphasis should be on achieving better results reproducibility over perfect methods reproducibility. This usually comes in the form of none less than the good old methods section of a paper1. If the algorithms used in an experiment are explained with sufficient detail, it will only be (relatively) trivial to reimplement them to produce very similar results on different data, thus reproducing (in the “results” sense) the original paper. What’s more interesting, writing an implementation of an algorithm from scratch is a great exercise and provides a great way to properly understand how a method works, not to mention the possibility to improve it. In fact, I recently had to reimplement some algorithms that were very well described in other paper’s methods sections (part of this, and the whole of this with some help)2. In the process I have better understood the algorithms and I ended up making improvements and extensions. It also has convinced me that trying to reimplement an algorithm from a paper could be an interesting part of a computational biology class. All of this is simply not possible through methods reproducibility, unless a thorough inspection of the source code is made, which in many cases can be a true nightmare. Even the most advanced container technology or programming language will eventually fade, but a well-written couple of paragraphs will continue on for a long time.

[1] or the documentation of your software package, or a chapter of a book or an interactive blog post

[2] our former colleague Omar was particularly good in reimplementing existing methods to make them more user friendly and extensible, like SIFT or motif-x