Chapter 1 Theoretical Considerations

Claerbout & Karrenbach (1992) define reproducibility as the ability to obtain the same results, from the same dataset. Conversely, they call a result replicable if one draws the same conclusion from a new dataset. This thesis is concerned with the former goal, providing researchers with an accessible yet modular workflow, that is virtually guaranteed to reproduce across time and computers.

The scientific community agrees that ideally, their work should be reproducible. Indeed it may be hard to find a researcher who distrusts a result because it is reproducible; to the contrary, many argue it is “good scientific practice” to ensure what they consider reproducible (“Reducing Our Irreproducibility,” 2013; Deutsche Forschungsgemeinschaft, 2019; Epskamp, 2019). Several reasons, practical and meta-scientific, justify this consensus of reproducibility as a minimal standard of Science.

Reproducibility makes researchers life more productive in two ways: The act of reproduction provides, at the most basic level, an opportunity for the researcher to spot errors. At the same time, other researchers may also benefit from reusing materials from an analysis they reproduced.

Beyond these two purely pragmatic reasons, reproduction is crucial, depending on the philosophical view of Science one subscribes to, because it allows independent validation and enables replication. Philosophers of Science characterise Science mainly as a shared method of determining whether or not a statement about the world is “true” (Andersen & Hepburn, 2016) or more broadly evaluating the statements verisimilitude (Gilbert, 1991; Meehl, 1990; Popper, 1962; Tichỳ, 1976). If this method is for experts to agree on the assumptions and deduce “truth”, reproducibility is hardly necessary. On the other hand, it does gain importance if one induces facts by carefully observing the world. The decisive difference between the above approaches is that the former gains credibility by the authority of the experts, while the latter is trustworthy because anyone may empirically verify it.

Accepting induction as a scientific method hence hinges on the verifiability by others. Some have even argued that such democratisation of Science is what fueled the so-called scientific revolution (Heilbron, 2004, Scientific Revolution). The scientific revolution had the experiment as an agreed-upon method to observe reality, and a much later revolution provided statistical modelling (Rodgers, 2010) as a means to induction. This consensus about how to observe and how to induce builds the foundation of modern empirical Science. Two reasons justify why we must assume reproducibility as a scientific standard if we accept induction as a scientific method: First, it allows independent verification of the process of induction, and second, it enables replication as a means to verify the induced truths.

However, neither verification of the induction nor the induces results are strictly enabled by the definition of reproducibility provided by Claerbout & Karrenbach (1992) given above. A simple thought experiment illustrates this shortcoming: Imagine a binary—therefore only machine-readable—program being perfectly reproducible; hence upon the input of the same dataset, it completes a scientific manuscript with the identical numbers at the right places. Furthermore, let us assume this hypothetical program may never hold if the dataset changes. Does the predicate “reproducible” in this situation reduce the number of mistakes or enables reuse? Unlikely. Or could one audit it and use it in replication? Hardly. This admittedly constructed case of a reproducible black box shows that we are not interested in reproducibility per se but rather in its secondary effects. Because it is a binary program, it does not enhance understanding and because it does not apply to other datasets, it does not facilitate productivity. In fact, such a program does not grant the researcher any practical or metascientific advantages over non-reproducible research products.

Spoiling its elegant simplicity, I, therefore, extend the definition by Claerbout & Karrenbach (1992) to address this issue, by further demanding that reproducibility must allow criticism and facilitate replication:

A reproducible scientific product allows one to obtain the same results from the same dataset in a way that enables substantive criticism and therefore facilitates replication.

Thus, transparency should enable reproducibility: Ensuring a clear link between the data and its results, promotes both replicability and reproducibility. Comprehension is a necessary precondition to form substantive criticism, which motivates the iterative scientific process. Bowing to this general notion, scientific publications are required to provide enough detail about the research process so that others may contribute constructive criticism. I am convinced that we should accept the same standard for everything that is published — including code.

Consequently, something is no longer either reproducible or not, but there are shades because a research product can promote replication and comprehension to varying degrees. Also note that a scientific result can facilitate replication without anyone ever attempting to replicate it, e.g. by educating other researchers about the method of analyses, being openly accessible and providing reusable components.

Hence, reproducibility has a technical aspect, which is ensuring identical results, and a non-technical side, which is facilitating understanding and progress through cumulative Science. The former relates to the practical advantages while the latter serves the metascientific purposes of reproducibility. An important caveat of the technical aspect is that generating the same results from the same data should always be possible regardless of time and computer. As such, a reproducible analysis should be:

  1. understandable by other researchers,
  2. transferable across computers,
  3. preserved over time.

This extended notion and demanding standard of reproducibility is justified by two recent developments in the social sciences in general and psychology in particular: the emergence of a “replication crises” (Ioannidis, 2005) and the rise of “machine learning” (Jordan & Mitchell, 2015) as a scientific tool. Both trends link to the use of statistical modelling on which the social sciences became reliant for testing and developing their theories (Gigerenzer et al., 2004; Meehl, 1978). It turns out that, if one fits the very same statistical model as published on newly gathered data, one fails more often to achieve results that are consistent with what was published, than one succeeds. (Open Science Collaboration, 2015).

Such failure to replicate findings that were believed to be robust has grown to a level that some social scientists call a crisis (Pashler & Wagenmakers, 2012). They put forth various causes and remedies to this crisis. Most remedies share a common motif: transparency. Some call for Bayesian statistics (Maxwell et al., 2015), as it makes assumptions more explicit, or demand preregistration (Nosek et al., 2018) as a means to clarify how to analyse the data, before data was collected. Others require the researchers to publish their data (Boulton et al., 2012). Similar calls for transparency, as a response to the replication crises, have formed the open science movement which stresses the necessity of six principles (Kraker et al., 2011):

  • Open Access,
  • Open Data,
  • Open Source,
  • Open Methodology,
  • Open Peer Review and
  • Open Educational Resources.

I argue that a research product resting on the first four pillars facilitates replication optimally and hence, it satisfies the highest standard of reproducibility. The last two pillars are then consequences of reproducibility. If everyone has access to a scientific product and its data along with the source code, everyone has the possibility of understanding the underlying methodology, which enables them to criticise the results and educate themselves. Having done so, they are in the best position for replication. Hence, any one’s ability to reproduce such a result gives a tangible affirmation of its usefulness to the scientific community.

While establishing reproducibility is no hurdle if one can perform the calculations needed with a pocket calculator, the more and more frequent use of computer-intensive methods renders such expectation questionable. The use of machine learning techniques, which has been once enabled by the computer taking over strenuous works like estimating and comparing thousands of models, now impedes our quest for reproducibility. More massive amounts of more complicated computer code than ever before, create room for errors and misunderstandings, leading the machine learning community to believe that they face a reproducibility crisis themselves (Hutson, 2018). At the same time, machine learning becomes more and more popular ins psychological research (Brandmaier et al., 2013; Jacobucci et al., 2019; Yarkoni & Westfall, 2017). Therefore, I am far from calling for abstinence from machine learning, just because it complicates reproduction, but want to emphasise the need for solutions that allow anyone in any field to reproduce even the most sophisticated analysis. Such possibility enables commutative Science and allows the researcher to build a more complete and accurate understanding of the fields subject matter.

Peikert & Brandmaier (2019) put forth an analysis workflow which provides this convenience for everyone to reproduce any kind of analysis. However, they fail to provide the same level of convenience for the researcher who created an analysis in the first place. Setting up the workflow eats up a considerable amount of the researcher’s time because it requires a level of technical sophistication that cannot be expected across all disciplines. This time should researchers rather spend on advancing research. This additional effort offsets the increase in productivity, promised by reproducibility, which I regard as most significant in the workflows adoption. Persuading researchers, who find the meta-scientific argumentation noble but impractical, do not care about it or even oppose it, requires concrete, practical benefits. Luckily, most of this setup process may become automated, letting the researcher enjoy the workflows advantages while decreasing the efforts necessary to achieve them. Providing a version of the analysis workflow by Peikert & Brandmaier (2019) that is easier to use and more accessible is the goal of this thesis and the herein presented repro-package for the R programming language (Peikert et al., 2020).

References

Andersen, H., & Hepburn, B. (2016). Scientific method. In E. N. Zalta (Ed.), The stanford encyclopedia of philosophy (Summer 2016). https://plato.stanford.edu/archives/sum2016/entries/scientific-method/; Metaphysics Research Lab, Stanford University.

Announcement: Reducing Our Irreproducibility. (2013). Nature, 496(7446), 398–398. https://doi.org/10.1038/496398a

Boulton, G., Campbell, P., Collins, B., Elias, P., Hall, W., Laurie, G., O’Neill, O., Rawlins, M., Thornton, J., & Vallance, P. (2012). Science as an open enterprise. The Royal Society.

Brandmaier, A. M., von Oertzen, T., McArdle, J. J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71–86. https://doi.org/10.1037/a0030001

Claerbout, J. F., & Karrenbach, M. (1992). Electronic documents give reproducible research a new meaning. SEG Technical Program Expanded Abstracts 1992, 601–604. https://doi.org/10.1190/1.1822162

Deutsche Forschungsgemeinschaft. (2019). Leitlinien zur Sicherung guter wissenschaftlicher Praxis. https://www.dfg.de/download/pdf/foerderung/rechtliche_rahmenbedingungen/gute_wissenschaftliche_praxis/kodex_gwp.pdf

Epskamp, S. (2019). Reproducibility and replicability in a fast-paced methodological world. Advances in Methods and Practices in Psychological Science, 2(2), 145–155. https://doi.org/https://doi.org/10.1177/2515245919847421

Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The Null Ritual: What You Always Wanted to Know About Significance Testing but Were Afraid to Ask. In D. Kaplan, The SAGE Handbook of Quantitative Methodology for the Social Sciences (pp. 392–409). SAGE Publications, Inc. https://doi.org/10.4135/9781412986311.n21

Gilbert, S. W. (1991). Model building and a definition of science. Journal of Research in Science Teaching, 28(1), 73–79. https://doi.org/10.1002/tea.3660280107

Heilbron, J. L. (Ed.). (2004). The Oxford Companion to the History of Modern Science. Reference Reviews, 18(4), 40–41. https://doi.org/10.1108/09504120410535443

Hutson, M. (2018). Artificial intelligence faces reproducibility crisis. Science, 359(6377), 725–726. https://doi.org/10.1126/science.359.6377.725

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLOS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124

Jacobucci, R., Brandmaier, A. M., & Kievit, R. A. (2019). A Practical Guide to Variable Selection in Structural Equation Modeling by Using Regularized Multiple-Indicators, Multiple-Causes Models. Advances in Methods and Practices in Psychological Science, 2(1), 55–76. https://doi.org/10.1177/2515245919826527

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415

Kraker, P., Leony, D., Reinhardt, W., Gü, N., & Beham, nter. (2011). The case for an open science in technology enhanced learning. International Journal of Technology Enhanced Learning, 3(6), 643. https://doi.org/10.1504/IJTEL.2011.045454

Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70(6), 487.

Meehl, P. E. (1990). Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles that Warrant It. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716

Pashler, H., & Wagenmakers, E. (2012). Editors’ Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence? Perspectives on Psychological Science, 7(6), 528–530. https://doi.org/10.1177/1745691612465253

Peikert, A., & Brandmaier, A. M. (2019). A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/8xzqy

Peikert, A., Brandmaier, A. M., & van Lissa, C. J. (2020). Repro: Automated setup of reproducible workflows and their dependencies. https://github.com/aaronpeikert/repro

Popper, K. R. (1962). Some comments on truth and the growth of knowledge. In E. Nagel, P. Suppes, & A. Tarski (Eds.), Logic, Methodology and Philosophy of Science Proceedings of the 1960 International Congress (Vol. 155). Stanford University Press.

Rodgers, J. L. (2010). The epistemology of mathematical and statistical modeling: A quiet methodological revolution. American Psychologist, 65(1), 1–12. https://doi.org/10.1037/a0018326

Tichỳ, P. (1976). Verisimilitude redefined. The British Journal for the Philosophy of Science, 27(1), 25–42.

Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393