Reproducible Research

Open-Discussion

Ellis Franklin
Elise Billoir

February 6, 2025

Table of Contents

Introduction to RR

Have you ever encountered challenges when trying to reproduce data analyses?


What should be kept the same
when reproducing a study? 🤔


What does reproducibility mean actually?



  • Difficult question to answer…
  • “Lack of reproducibility”
    ➡️ understood but broad expression
  • No clear definition
    ➡️ varies between the disciplines

What does reproducibility mean actually?



  • Difficult question to answer…
  • “Lack of reproducibility”
    ➡️ understood but broad expression
  • No clear definition
    ➡️ varies between the disciplines

Gundersen Odd Erik 2021 The fundamental principles of reproducibility Phil. Trans. R. Soc. A.37920200210 DOI: 10.1098/rsta.2020.0210.

What does reproducibility mean actually?


  • Direct reproducibility (also called experimental reproducibility)
  • Analytical reproducibility (also called computational reproducibility)


Other related terms to distinguish:

  • Replicability: Different data | Same analysis methods
  • Robustness: Same data | Different analysis methods
  • Generalisability: Different data | Different analysis methods

In other words…



Data

Code

Documentation

In other words…



Reproducible Research

Why are we talking about this now?

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016). DOI: 10.1038/533452a.

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016). DOI: 10.1038/533452a.


  • 70% of researchers can’t reproduce others’ findings

  • 60% can’t reproduce their own findings

  • Non-reproducible biomedical research costs $28 billion/year (Freedman et al. 2015)

How important is it? Well, quite important



  • Anyone (with a similar level of skills) should be able to do reproduce your research and benefit from it
  • Funders (e.g., ANR) and journals now require data and code accessibility, which aligns with the growing open-science movement
  • If I keep my data and code, I have a competitive advantage
  • If I share my data and code, my work will gain more visibility (and citations 😎)

Advantages



  • It obliges you to verify your work (by sharing both docs, data and code)
  • Your future self will thank you (you’ll be much more productive)
  • Your collaborators too
  • By ensuring reproducibility, you reinforce your credibility and reputation
  • Reproducibility fosters trust in scientific progress and accelerates it

Sources of irreproducibility

What sources of irreproducibility
can you think of? 🤔

Factors decreasing reproducibility




  • Failures of reproducibility cannot be traced to a single cause
  • Nearly every aspect of a study needs to be considered

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016). DOI:  10.1038/533452a.

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016). DOI:  10.1038/533452a.

Example: a drawing protocol

You really want to draw something but don’t know how. You’ve found this protocol in a very prestigious journal and want to reproduce it. Grab your pens and papers 📝 !

  1. Draw a large circle in the center of your page.
  2. Inside the circle, slightly above the middle, draw two medium-sized ovals next to each other. Inside each oval, draw a smaller filled-in oval.
  3. Below the ovals, in the center of the circle, draw a small upside-down triangle.
  4. On top of the circle, draw two small triangles with the base of each triangle touching the edge of the circle and the points facing outward.
  5. From the bottom point of the triangle, draw two curved lines that extend outward, one curving to the left and one to the right, forming a “W” shape.
  6. On each side of the curved lines, draw three short straight lines extending outward.

Example: a drawing protocol

My attempt

 

The result in the study

How did you do?

Example: a drawing protocol


Timothée Poisot (Sep 8 2015) ‘Step 2 — do the rest of the fucking analysis’ [Medium], Page 1, accessed 10 Jan 2025.

Timothée Poisot (Sep 8 2015) ‘Step 2 — do the rest of the fucking analysis’ [Medium], Page 1, accessed 10 Jan 2025.

 

“Unavailable protocol”

“Unavailable protocol”

 

“Ambiguous protocol”

“Ambiguous protocol”

How can we be bit more reproducible?


  • Document practices (e.g., lab notebooks, ELNs with eLabFTW)

How can we be bit more reproducible?


  • Document practices (e.g., lab notebooks, ELNs with eLabFTW)
  • Organise code, data and files (e.g., separate folders for different data, clear filenames, tidy principles)

 

“Piled Higher and Deeper” by Jorge Cham (www.phdcomics.com)

“Piled Higher and Deeper” by Jorge Cham (www.phdcomics.com)

 

Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst

Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst

 

How can we be bit more reproducible?


  • Document practices (e.g., lab notebooks, ELNs with eLabFTW)
  • Organise code, data and files (e.g., separate folders for different data, clear filenames, tidy principles)
  • Share data and outputs

 

 

How can we be bit more reproducible?


  • Document practices (e.g., lab notebooks, ELNs with eLabFTW)
  • Organise code, data and files (e.g., separate folders for different data, clear filenames, tidy principles)
  • Share data and outputs
  • Track changes (= versionning)

 

 

How can we be bit more reproducible?


  • Document practices (e.g., lab notebooks, ELNs with eLabFTW)
  • Organise code, data and files (e.g., separate folders for different data, clear filenames, tidy principles)
  • Share data and outputs
  • Track changes (= versionning)
  • Learn to code (even a little)

Artwork by Allison Horst

Every little helps:
Literate Programming with Quarto

What is Literate programming?

  • Introduced by Donald Knuth in 1984

  • Idea of combining source code and text

  • Method for writing computer programs as literary essays

  • Many tools exist for various languages:

    • RMarkdown (R)

    • Jupyter (Python)

    • Quarto (multi-language)

Knuth, Donald Ervin. “Literate Programming.” Computer/law journal (1984).

“The best way to communicate from one human being to another is through story”
Donald Knuth

What is Quarto?


  • Quarto is an open-source, scientific and technical publishing system
  • It implements the concept of literate programming
  • It provides a unified authoring framework for data science, combining your code, its results, and your prose (text)


Illustration by Alison Hill and Allison Horst, for RStudio

Advantages of Quarto


  • Quarto documents are fully reproducible and dynamic
  • They automate the inclusion of the last versions of the results of an analysis
  • Allows you to avoid copy + paste, avoid accidental errors and save time
  • Literally dozens of output formats are available: Web pages, PDFs, Word files, websites, books, and more.
  • Open source so anyone can use it

Illustration by Alison Hill and Allison Horst, for RStudio

Main use cases


There can be three relevant use cases:


  1. For communicating to decision-makers, your supervisors or to a more general audience, who want to focus on the conclusions, not the code behind the analysis
  2. For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them (i.e. the code)
  3. As an environment in which to do data science, as a modern-day lab notebook where you can capture not only what you did, but also what you were thinking

To advance, we must code a little…

Artwork by Allison Horst