class: center, middle, inverse, title-slide .title[ # Reproducible Research in R ] .subtitle[ ## How to do the same thing more than once ] .author[ ### Hannes Diemerling, Aaron Peikert, & Maximilian Ernst ] .date[ ### Oxford Summer School | 2025-01-15 ] --- class: middle, center ## In a nutshell .left[ 1. aaronpeikert.github.io/repro-workshop/self-paced 2. Do the .todo[green stuff]. 3. Take your time. ] <img src="images/qr_slides.svg" width="30%" style="float:center" /> --- class: inverse, center, middle # Why should we work reproducibly? --- class: inverse, center, middle # ~~Why~~ How should we work reproducibly? --- class: center, middle # Reproducible Research # = # same data + same analysis # = # same results --- class: center, middle <img src="images/frustrating.gif" width="30%" /> .tiny[No idea how to attribute self-made memes.] --- class: center, middle <img src="images/satisfying.jpeg" width="50%" /> .tiny[No idea how to attribute self-made memes.] --- class: middle .right[ ## Why am I talking about reproducible research? ] .right[ ## Glad you asked! ] --- class: middle .left[ Let me present to you today: ] .center[ # Reproducible Research: The Bleeding Edge ] .right[ An action movie. ] --- ## Properties we are after: * fully automated * independent of hardware * independent of users * continually verified --- ## Problems we want to solve: 1. copy&paste mistakes 2. inconsistent versions of code or data 3. missing or incompatible software 4. complicated or ambiguous procedure for reproduction --- ## Software we will use: 1. R Markdown 2. Git 3. Docker 4. Make --- ## Services we will use: * Posit Cloud (= RStudio Cloud) * GitHub (≠ Git) * GitHub Actions (∈ GitHub) --- class: middle, center ## TL;DR: This is almost impossible. ## Don't freak out. -- Ok, maybe a bit. --- class: center, middle # Today: ### I push you to the brink of what is possible† -- † in the form of a cozy precooked meal. --- .center[ # Alone in the wild: ] -- ## 1. do the necessary, -- ## 2. than the possible, -- ## 3. and you reach the impossible. --- class: middle .center.pull-left[ # Goal for today: ### Understanding what is possible! ] -- .center.pull-left[ # Not today: ### Reaching a deep technical understanding of what is possible. ] --- ## How to behave in this workshop? 1. Be nice and help each other. 2. Take your time, there is no schedule. 3. Don't be shy: Answer and ask questions, even stupid ones. 4. .todo[Do things in this color (green).] 5. .optional[Try things in this color (blue). They are completely optional.] -- ## Let's try it. --- ## Materials <img src="images/qr_slides.svg" width="30%" style="float:right; padding:10px" /> .todo[Tell your neighbors your name and how familiar you are with R.] -- Which color means: "this task is optional"? -- Do you have a question? -- .todo[Keep the slides handy for copy & pasting stuff:] [aaronpeikert.github.io/repro-workshop/self-paced/](https://aaronpeikert.github.io/repro-workshop/self-paced/) -- .optional[Scan the QR code (→) to get the materials:] ??? 1. Be nice and help each other. 2. Don't be shy: Answer and ask questions, even stupid once. 3. .todo[Do things in this color.] 4. .optional[Try things in this color (they are completely optional).] Workshop within the workshop. --- ## Takeaways 1. What are the most important aspects learned? 2. How will you try to ensure reproducibility in the future? 3. What was your biggest challenge about reproducibility that was addressed? --- ## Questions?! Comments? Notes? Write them down: https://pad.gwdg.de/s/Yl6_LbMEV --- class: inverse, center, middle .left[Level 1:] # The necessary --- class: middle .center[ .large[code + data] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- ## A Note on Computing Infrastructure Usually you work on a private laptop. -- **Not today.** For ease of setup we work in the Cloud. But confusingly we than ask another cloud provider to reproduce everything again somewhere else. An incomplete list of computational infrastructures: - laptops - virtual machines - single servers - cloud computers - high performance computers --- ## The Boring Stuff .pull-left[ ### Sign In #### GitHub <https://github.com/login> #### RStudio Cloud *Use Option: "Log In with GitHub"* <https://login.rstudio.cloud/login> ] .pull-left[ ### Sign Up #### GitHub <https://github.com/signup> #### RStudio Cloud *Use Option: "Sign Up with GitHub"* <https://login.rstudio.cloud/register> ]  --- # Posit / RStudio Cloud Access the project: https://posit.cloud/content/9261850 --- # Where are you? Your computer → Posit Cloud --- # A single script .todo[Look around, become comfortable. Then, locate the files pane.] -- .todo[Open the R folder.] .todo[Click on the file `R/prepare_games.R`] *** .optional[Take a look at `R/prepare_inflation.R` (same thing but more complicated)] --- # An incomplete list of best practices 1. list requirements early 2. use relative locations 3. document relevant information --- # A few principles of documentation .center[ document relevant information = lot of writing ?] -- ## No -- , not necessarily. -- 1. What is *standard* does not have to be documented. 2. What is *easy* needs only little documentation. 3. What is *consistent* only has to be documented once. --- class: middle, center .left[Congratulations, you have completed:] # Necessary --- class: inverse, center, middle # Expanded Goals --- background-color: #F0F8F8 .center[ <img src="https://raw.githubusercontent.com/aaronpeikert/repro-tutorial/main/images/nutshell.svg" width="90%" /> ] --- class: middle .center[ .large[code + data + text + history + software + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- class: middle .center[ .large[**code + data** + text + history + software + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- .center[ # Todo list ] .pull-left[ ## Problems 1. Copy&paste mistakes 2. Inconsistent versions of code or data 3. Missing or incompatible software 4. Complicated or ambiguous procedure for reproduction ] .pull-right[ ## Solutions 1. RMarkdown 2. Git 3. Docker 4. Make ] --- ## Software used: * R ## Services used: * Posit Cloud ## You are here: Your computer → Posit Cloud → R --- class: inverse, center, middle .left[Level 2:] # Reproducibility: Current practice --- class: inverse, center, middle .left[Tool to learn:] # RMarkdown --- class: middle .center[ .large[code + data + **text** + history + software + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- ## A first taste of RMarkdown .todo[Open the file `inflation.Rmd`.] .todo[Take a minute to skim the document.] .todo[Click on `knit`.] *** .optional[Admire some examples at the [RMarkdown Gallery](https://rmarkdown.rstudio.com/gallery.html)]. --- name: desiree class: inverse, full, middle background-size: contain background-color: #F0F8F8 background-image: url(https://rstudio-education.github.io/sharing-short-notice/images/screenshots/Single-rmd0.png) .pull-right.vertical.small[Created by [Desirée De Leon](https://github.com/dcossyleon), unchanged, [CC-BY-NC](https://creativecommons.org/licenses/by-nc/4.0/).] --- ## YAML Metadata .todo[Change `html_document` to `pdf_document`.] .todo[Knit again.] *** .optional[Change the author or date field.] .optional[Try the [`tufte`-format (click me)](https://rstudio.github.io/tufte/).] --- template: desiree background-image: url(https://rstudio-education.github.io/sharing-short-notice/images/screenshots/Single-rmd4.png) --- ## Markdown | Text .todo[Make something **bold** and something else *italic*]: ``` This is **bold**, while this is *italic*. ``` *** .optional[Go to `Help → Markdown Quick Reference` and try something out.] --- template: desiree background-image: url(https://rstudio-education.github.io/sharing-short-notice/images/screenshots/Single-rmd5.png) --- ## R & Python & Octave & Julia | Code -- .todo[Add a code chunk (Ctrl + Alt + I) and inline code:] ```` A code chunk is for longer code/output: ```{r} with(mtcars, plot(hp, mpg)) ``` Inline code is for single numbers/short text: We have `r nrow(mtcars)` cars. ```` .optional[Include all the code in output with: `knitr::opts_chunk$set(echo = TRUE)`] .optional[Try using python:] ```` ```{python} print("Hi! Python here, do you miss R already?") ``` ```` --- class: middle, center .left[Congratulations, you have completed:] # Reproducibility: Current practice --- background-color: #F0F8F8 .center[ <img src="https://raw.githubusercontent.com/aaronpeikert/repro-tutorial/main/images/nutshell.svg" width="90%" /> ] --- class: middle .center[ .large[**code + data + text** + history + software + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- .center[ # Todo list ] .pull-left[ ## Problems 1. ~~copy&paste mistakes~~ 2. inconsistent versions of code or data 3. missing or incompatible software 4. complicated or ambiguous procedure for reproduction ] .pull-right[ ## Solutions 1. **RMarkdown** 2. Git 3. Docker 4. Make ] --- ## Software used: * R, RMarkdown ## Services used: * Posit Cloud ## You are here: Your computer → Posit Cloud → RMarkdown → R --- class: center, middle # Break? --- class: inverse, center, middle .left[Level 3:] # Reproducibility: best practice --- class: inverse, center, middle .left[Tool to learn:] # Git/GitHub --- class: middle .center[ .large[code + data + text + **history** + software + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- class: center ## A wholly inadequate explantation of version control -- ### Version 1 → Version 2 → Version 3 → Current -- ### + ### Time travel -- ### + ### Change detection --- ## Introduce yourself to Git .todo[Modify, then run:] ```r # use a function without loading the package # package::function usethis::use_git_config( user.name = "Jane Doe", user.email = "jane@example.org", init.defaultBranch = "main") # <-- not necessary but kinder than 'master' ``` *Done once per computing environment.* --- ## Activate Git .todo[Run:] ```r usethis::use_git() ``` *Done once per project.* --- class: center ## Current status ### Version 1 → Current ### + ### nowhere to time travel ### + ### no changes --- ## Make history Git has just started to record, so there is only a single entry. Lets add some changes: .todo[The plot is ~~ugly~~ very functional. Beautify it a bit. Some suggestions:] ``` theme_minimal() + ylab("subjective inflation in %-points") + labs(color = "") + theme(legend.position = c(.1, .9)) + ``` *** .optional[Plot the the two or five year expectation.] .small[[Hint: Swapping the variable `E1y_all` in `R/prepare_inflation.R` should to the trick.]] --- class: center ## Current status ### Version 1 → Current ###+ ### time travel to undo changes ###+ ### changes detected --- ## Record history Git commits are like checkpoints in a computer game. They save the current state of the project and you can always come back to them. .todo[Create a commit:] Git pane → Click checkbox of changed files → Commit → Message → Commit *** .optional[What can you do when you delete a file by accident?] .optional[Can Git help when you loose your computer / access to Posit Cloud?] --- class: center ## Current status ### Version 1 → Version 2 → Current ### + ### time travel to version 1/2 ### + ### nothing changed compared to version 2 (but vs. version 1) --- class: center # Current status .pull-left[ ## Local (Posit Cloud) ### Version 1 → Version 2 → Current ] .pull-right[ ## Remote (GitHub) `nothing` ] --- ## Introduce yourself to GitHub .todo[To get a GitHub pat/token run:] ```r usethis::create_github_token(description = "Token for Repro Workshop") ``` .todo[Activate scope `write:packages`.] .todo[Modify expiration. Today is enough.] .todo[Copy token.] *Done once per computing environment.* --- ## Store token .todo[Set token:] ```r gitcreds::gitcreds_set() # <-- Token must *not* go into brackets, paste when asked ``` *** .optional[Verify that everything is in order:] ```r usethis::gh_token_help() ``` *Done once once per computing environment.* --- class: center # Current status .pull-left[ ## Local (Posit Cloud) ### Version 1 → Version 2 → Current ] .pull-right[ ## Remote (GitHub) ### `nothing` ] We only have authenticated ourselves, nothing has left Posit Cloud. --- ## Note GitHub might find it funky that 20 people push the same code from the same IP range (Posit Cloud). If pushing code fails or asks for the password, we have triggered spam detection. In this case, we will have to repeat the GitHub handshake. Generally the "handshake" only needs to be done once per machine (per token validity period). --- ## Activate GitHub .todo[To activate GitHub and upload your files to the public web:] ```r usethis::use_github() ``` *** .optional[Private alternative/ upload to non-public web (don't use now):] ```r usethis::use_github(private = TRUE) ``` .optional[Can you simply use code from others that you find on GitHub?] .optional[Try `usethis::use_mit_license()`] .optional[Up for a challenge? Try `usethis::use_readme_rmd()`] --- class: center # Current status .pull-left[ ## Local (Posit Cloud) ### Version 1 → Version 2 → Current ] .pull-right[ ## Remote (GitHub) ### Version 1 → Version 2 → Current ] --- class: middle, center .left[Congratulations, you have completed:] # Reproducibility: best practice --- background-color: #F0F8F8 .center[ <img src="https://raw.githubusercontent.com/aaronpeikert/repro-tutorial/main/images/nutshell.svg" width="90%" /> ] --- class: middle .center[ .large[**code + data + text + history** + software + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- .center[ # Todo list ] .pull-left[ ## Problems 1. ~~copy&paste mistakes~~ 2. ~~inconsistent versions of code or data~~ 3. missing or incompatible software 4. complicated or ambiguous procedure for reproduction ] .pull-right[ ## Solutions 1. **RMarkdown** 2. **Git** 3. Docker 4. Make ] --- ## Software used: * R, RMarkdown, Git ## Services used: * Posit Cloud, GitHub ## You are here: Your computer → Posit Cloud → GitHub --- class: inverse, center, middle .left[Level 4:] # Reproducibility: longterm perspektive --- class: inverse, center, middle .left[Tool to learn:] # Docker --- # What is Docker? No slide here. TBD. Sorry! --- ## A note on workflow automation This workshop uses a lot of automation to automate reproducibility, e.g., to configure software, fill templates, interact with Git/GitHub etc. Among them: * usethis * gitcreds * gh And one of my packages: * `repro` These packages are "training wheels" and not required. --- ## `repro`-metadata `repro` uses metadata in Rmds to infer (among other things) the software environment required. .todo[Trigger this mechanism using: `repro::automate()`] .todo[Take a look at the newly generated file `Dockerfile`.] .todo[Remove all, if any, python chunks.] .todo[Commit changes. (Do not push.)] .optional[Add any package of your liking to any Rmd and run `repro::automate()`] .optional[Take a look at the file `.repro/Dockerfile_base` and friends. How would you change the R version used?] --- ## Build the image GitHub action (GHA) is a cloud service that runs software when certain events trigger it. .todo[Add a GitHub Action to build the required Docker image with `repro::use_gha_docker()`] .todo[To trigger the action, commit and push your changes.] .todo[Browse through the actions: `usethis::browse_github_actions()`] *** .optional[Take a look at `.github/workflows/push-container.yml`. When is the action triggered?] --- class: middle, center .left[Congratulations, you have completed:] # Reproducibility: longterm perspektive --- background-color: #F0F8F8 .center[ <img src="https://raw.githubusercontent.com/aaronpeikert/repro-tutorial/main/images/nutshell.svg" width="90%" /> ] --- class: middle .center[ .large[**code + data + text + history + software** + workflow] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- .center[ # Todo list ] .pull-left[ ## Problems 1. ~~copy&paste mistakes~~ 2. ~~inconsistent versions of code or data~~ 3. ~~missing or incompatible software~~ 4. complicated or ambiguous procedure for reproduction ] .pull-right[ ## Solutions 1. **RMarkdown** 2. **Git** 3. **Docker** 4. Make ] --- ## Software used: * R, RMarkdown, Git, Docker ## Services used: * Posit Cloud, GitHub, GitHub Package Registry ## You are here: Your computer → Posit Cloud → GitHub → Dockerimage --- class: center, middle # Break? --- class: inverse, center, middle .left[Level 5:] # Reproducibility: ultimate --- class: inverse, center, middle .left[Tool to learn:] # Make --- class: middle .center[ .large[code + data + text + history + software + **workflow**] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- class: middle ## Whats cooking? .pull-left[ > Arrabbiata sauce, or sugo all'arrabbiata in Italian, is a spicy sauce for pasta made from garlic, tomatoes, and dried red chili peppers cooked in olive oil. .small["Arrabbiata sauce" from Wikipedia under [CC BY-SA 3.0](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License)] ] --- class: middle ## Whats cooking? .pull-left[ > Arrabbiata sauce, or sugo all'arrabbiata in Italian, is a spicy **sauce** for **pasta** made from **garlic**, **tomatoes**, and dried red **chili** peppers cooked in olive oil. .small["Arrabbiata sauce" from Wikipedia under [CC BY-SA 3.0](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License), emphasis added] ] -- .pull-right[ .large[ ``` arrabiata.pdf: arrabiata.Rmd sauce.csv R/pasta.R Rscript -e "rmarkdown::render('arrabiata.Rmd')" sauce.csv: R/cook.R tomatoes.zip aromatics.yaml Rscript -e "source(R/cook.R)" aromatics.yaml: R/sizzle.py garlic.txt chili.json python sizzle.py garlic.txt python sizzle.py chili.json ``` ] ] --- ### Key features Missing ingredients will be generated, .small[*e.g., if the cleaned data is missing, the raw data is first cleaned.*] Newer ingredients trigger updates, .small[*e.g., new data leads to recreation of the whole manuscript.*] Always the same "button" that triggers reproduction, .small[*e.g., regardless of programming language, file format, and intermediate steps.*] -- .center.large[ `repro::automate()` auto-generates recipes for Rmds ] -- .center[ only deeper nested dependencies must be added manually. ] --- ## Make in action .todo[Go to the terminal (`Alt + Shift + M`) and type `make inflation.pdf`] .todo[Delete `inflation.pdf`.] .todo[Add `inflation.pdf` to the target `all` (`all: inflation.pdf`).] .todo[Go to the terminal (`Alt + Shift + M`) and type `make`] *** .optional[Run: `make -B --dry-run`] `-B` means rebuild everything. `--dry-run` means do not actually run the commands. --- ## Make in the cloud .todo[Run `repro::use_make_publish()`] .todo[Paste the following into the `Makefile`:] ``` publish/: inflation.pdf include .repro/Makefile_publish ``` .todo[Run `repro::use_gha_publish()`] .todo[Commit and push.] *** .optional[Inspect `.repro/Makefile_publish`.] --- class: middle, center <img src="https://publicdomainvectors.org/download.php?file=johnny_automatic_hourglass.svg" alt="" width="20%" style="filter: invert(1);"> --- class: middle, center .left[Congratulations, you have completed:] # Reproducibility: ultimate† -- †To present this accomplishment to the world, one further step is needed: publishing. --- background-color: #F0F8F8 .center[ <img src="https://raw.githubusercontent.com/aaronpeikert/repro-tutorial/main/images/nutshell.svg" width="90%" /> ] --- class: middle .center[ .large[**code + data + text + history + software + workflow**] <img src="https://publicdomainvectors.org/download.php?file=CARTON01.svg" width="30%" /> ] --- .center[ # Todo list ] .pull-left[ ## Problems 1. ~~copy&paste mistakes~~ 2. ~~inconsistent versions of code or data~~ 3. ~~missing or incompatible software~~ 4. ~~complicated or ambiguous procedure for reproduction~~ ] .pull-right[ ## Solutions 1. **RMarkdown** 2. **Git** 3. **Docker** 4. **Make** ] --- ## Software used: * R, RMarkdown, Git, Docker, Make ## Services used: * Posit Cloud, GitHub, GitHub Package Registry, GitHub Actions ## You are here: * Posit Cloud → GitHub → GitHub Actions → GitHub Package Registry → Dockerimage → Make → Rscript → R Markdown --- class: inverse, center, middle # Last Step: Publish --- ## GitHub Pages .todo[Go to Settings → Pages and change `none` to `gh-pages`.] .todo[Go to actions and wait for the new action to finish.] .todo[Inspect the published PDF online (change username):] `yourusername.github.io/project/inflation.pdf` *** .optional[Change something, e.g., make the plot ugly again, then commit and push. Takes ~5min or so.] --- class: inverse, center, middle # Done. --- class: center, middle # Thanks! --- class: inverse, center, middle # Questions? .tiny[I'll tell you how to continue the journey in a minute.] --- class: center, middle # How to procede --- ## Recommended: 1. RMarkdown: Skim [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/), then try to write a manuscript/homework/report 2. Git/GitHub: Find a friend, visit a workshop, then collaborate on a mini project --- ## Advanced: 3. Read: Peikert, A., van Lissa, C. J., & Brandmaier, A. M. (2021). Reproducible Research in R: A Tutorial on How to Do the Same Thing More Than Once. *Psych*, *3*(4), 836–867. https://doi.org/10.3390/psych3040053 4. Adept the project we created today, so that it becomes one of your projects (with your data, your analyses). --- ## Digging really deep: 3. Make: Read [Minimal Make](https://kbroman.org/minimal_make/), than simplify a complicated pipeline 4. Docker: get Docker installed, not fun 5. Latex: really not fun -- 6. Enjoy perfect reproducibility!