Reproducible Research

A Talk on How to Do the Same Thing More Than Once

Aaron Peikert

Max Planck Institute for Human Development

Reproducibility

Reproducibility

same thing in, same thing out

As a student:

Conceptually

SEXY

Technologically

SEXY

At the begin of my PhD:

Conceptually

SEXY

Technologically

SEXY

At the end of my PhD:

Conceptually

SEXY

Technologically

SEXY

Now:

Conceptually

SEXY

Technologically

SEXY

Accurate, unbiased, complete, and insightful reporting of the analytic treatment of data (be it quantitative or qualitative) must be a component of all research reports.

— Publication manual of the American Psychological Association, 7th ed.

Conceptual

Conceptual

Conceptual

What is the purpose of reproducibility?

1. Collaboration

1. Collaboration

2. Error checking

1. Collaboration

2. Error checking

3. Prediction

Are you a good Human / Researcher?

Are you a good Human / Researcher?

Perhaps.

Are you a good Human / Researcher?

Perhaps.

But it does not hurt to want reproducibility out of selfish reasons.

1. Collaboration

Nous connaissons la vérité non seulement par la raison mais encore par le coeur […]

We know the truth, not only by the reason, but also by the heart.

Pascal, B. (1670). Pensées. (Lines 110–282).

→ Practical Part

2. Error checking

  1. Computational level → Goal
  2. Algorithmic level → Method
  3. Implementational level → Means
  4. Hardware level → Platform
  5. Data level → Reality
  6. Person level → Experience

1-3. stolen from David Marr 1982/2010 book Vision

2. Error checking

Redoing activities try to exclude mistakes in these domains:

  • Redoing swaps nothing but Time
  • Reproduction swaps Person and Hardware
  • Direct Replication swaps Data
  • Reimplementation swaps Implementation
  • Robustness checks swap Algorithm
  • Conceptual Replications swap Goal

3. Prediction

Same data → Same Black box → Same results

Same data → Same Black box → Same results

Reproducible?

Statistical Models

fit

data

Statistical Models

overfit

data

By how much?

By how much?

or

Reproducibility and Overfit

By how much?

  • \(\mathit{R}^2_{\text{adj.}}\)
  • \(C_p\)
  • \(AIC\)
  • Cross Validation

\[ \mathit{R}^2_{\text{adj.}} = \mathit{R}^2 - (1-R^2)\frac{p}{n - p - 1} \]

\[ C_p = \frac{\sum_{i=1}^n(\hat{y}_i- y_i)^2}{\sigma^2_e}-n-2p \]

Reproducibility?

Reproducibility?

\[ \text{df}(\hat{y}) = \frac{\sum^n_{i = 1}{\text{Cov}(\hat{y}_i, y_i)}}{\sigma^2_e} \]

Reproducibility?

\[ \text{df}(\hat{y}) = \frac{\sum^n_{i = 1}{\text{Cov}(\hat{y}_i, y_i)}}{\sigma^2_e} \]

This covariance requires:

  • formal derivation
  • repeated computation on same or pertubated data

This covariance requires:

  • formal derivation
  • repeated computation on same or pertubated data

Reproducibility.

\[ \mathit{R}^2_{\text{adj.}} = \mathit{R}^2 - (1 - \mathit{R}^2)\frac{n - df}{df - 1} \]

\[ C_p = \frac{\sum_{i=1}^n(\hat{y}_i - y_i)^2}{\sigma^2_e} - 3n + 2df \]

Information Criteria

Require the Hessian around the solution for their “overfit” correction.

Computed via:

  • byproduct of optimization
  • via finite differences

Require the Hessian around the solution for their “overfit” correction.

Computed via:

Reproducibility

Cross Validation

is

Reproduction

on subsamples.

Extended Definition

First, computational reproducibility must ensure that the same data lead to the same results.

Extended Definition

First, computational reproducibility must ensure that the same data lead to the same results.

Second, computational reproducibility must make the inductive process repeatable on similar data.

Summary

%%{init: {'theme':'forest'}}%%
flowchart LR

  R(Reproducibility)
  R --> X(Replication)
  R --> Y(Preregistration)
  T(Trustworthy)

  X --> T
  Y --> T
  R --> T

Technological

Technological

For reproducibility, it really needs to be reproducible and checkable by a stranger with little time or energy to spare, because even the author will soon enough be that stranger.

Gwern Branwen

Four Problems with Reproducibility

  1. versioning
  2. copy&paste errors
  3. software dependencies
  4. linking everything together

Four Solutions for Reproducibility

  1. version control
  2. dynamic documents
  3. software management
  4. workflow orchestration

ACRISS

Association of Car Rental Industry Systems Standards

Category Type Trans / Driven wheels Fuel / air-con
E: Economy F: SUV A: Auto (drive unspecified) R: Unspecified Fuel With AC
I: Intermediate T: Convertible B: Auto 4WD D: Diesel With AC
S: Standard C: 2/4 Door D: Auto AWD H: Hybrid With AC
F: Fullsize D: 4-5 Door M: Manual (drive unspecified) E: Electric With AC
P: Premium S: Sport V: Petrol With AC

EDMR

The Rental Car Model

Ride it like you stole it

Four Solutions for Reproducibility

  1. version control
  2. dynamic documents
  3. software management
  4. workflow orchestration

R + RMarkdown + Docker + Make + Git

https://github.com/aaronpeikert/reproducible-research

Julia + RMarkdown + Pkg.jl + GitHub Actions + Git

https://github.com/formal-methods-mpi/pkgmanuscript/blob/main/Dockerfile

Lua + Quarto + GitHub Actions + GitHub Actions + Git

https://github.com/aaronpeikert/repro-talk

Python + Quarto + Docker + GitHub Actions + Git

https://github.com/formal-methods-mpi/projects/pull/41/files

R + RMarkdown + Docker + Make + Git

https://github.com/aaronpeikert/bayes-prereg/pull/97

Summary

%%{init: {'theme':'forest'}}%%
flowchart LR

  R(Reproducibility)
  R --> C(Collaboration)
  R --> X(Replication)
  R --> Y(Preregistration)
  T(Trustworthy)

  X --> T
  Y --> T
  C --> Cu(Cumulative)
  T --> Cu
  C --> T

Time to get active

  • find a scientific document where you did the data analysis, e.g. master thesis, conference talk/poster etc. (5 min)
  • Could you reproduce it in a day? Write down what helps and what is not helpfull: https://pad.gwdg.de/s/fR6cTmR3K (10min)
  • Find an article/preprint you like and try to get the materials nessesary (30min)
  • Diskussion