Software Management

ReproDude

Hey, I’m your ReproDude for this chapter. If you have any questions click on me and we can talk!

What now?

And on to another chapter! What’s next?

Like last time, let’s look at our components. Which ones are we examining now?

code + data + text + history + software + workflow

Software? Maybe a look at our problem and software solution list will help us again.

Problem list:

  1. copy&paste mistakes
  2. inconsistent versions of code or data
  3. missing or incompatible software
  4. complicated or ambiguous procedure for reproduction

Software solution list:

  1. RMarkdown
  2. Git
  3. Docker
  4. Make

So does that mean we use Docker to avoid the problem of missing or incompatible software? This sounds like a program I would have needed a while ago. Then let’s jump right in!

Docker?

Let me introduce Docker: your software superhero! It saves you from compatibility chaos and missing dependencies. Docker images effortlessly make all required software availible on any computer (with Docker installed). Docker operates on the principle of containerization, which essentially means encapsulating your software application in a bundle with its own operating system, also known as a container. This approach provides a unified and consistent environment that is isolated from the rest of your system, ensuring that your software runs the same way, no matter where it is deployed.

Short note from the management, this workshop uses a lot of automation to automate reproducibility, e.g., to configure software, fill templates, interact with Git/GitHub etc.

Among them:

  • usethis
  • gitcreds
  • gh

and a package from Aaron:

  • repro

These packages are “training wheels” and are not strictly required; you could configure everything manually.

Nevertheless, the use of repro prevents headaches by automating many steps. [Besides, it would go way beyond the scope of the workshop if we were to do every step ourselves.]

repro!

repro

I will lead you to a deep dive of the repro package.

Then let’s dive into the repro package together.

repro uses metadata in Rmds to infer (among other things) the software environment required.

Trigger this mechanism using: repro::automate()

Take a look at the newly generated file Dockerfile.

Remove all, if any, python chunks.

Commit changes. (Do not push.)

Add any package of your liking to any Rmd and run repro::automate()

Take a look at the file .repro/Dockerfile_base and friends. How would you change the R version used?

Now we will use GitHub action (GHA). GitHub action is a cloud service that runs software when certain events trigger it (like Pushing something up to GitHub).

Add a GitHub Action to build the required Docker image with repro::use_gha_docker()

To trigger the action, commit and push your changes.

Browse through the actions: usethis::browse_github_actions()

Take a look at .github/workflows/push-container.yml. When is the action triggered?

That’s it. But what have we actually just done?

We created Dockerfiles using the repro package and then automated the building of Docker images by setting up GitHub Action. These Docker images now ensure a reproducible software environment! At the moment, we have not used the newly build image, that follows in the next step.

And now?

Congratulations, we are approaching the finish line!

Before we continue, let’s take another quick look together at what we have just done. We now have one more component in our toolbox:

code + data + text + history + software + workflow

And with that, we solved our third problem on the list:

  1. copy&paste mistakes
  2. inconsistent versions of code or data
  3. missing or incompatible software
  4. complicated or ambiguous procedure for reproduction

And which software did we just use for this?:

  1. RMarkdown
  2. Git
  3. Docker
  4. Make

Final Step!

Now please go through what we have just done and all the software we used.

You are currently at your computer using Posit Cloud, which hosts an R environment where you used the repro package and GitHub to generate a Docker image to ensure a reproducible software environment.

That ends this section. Should we pause for a short break or proceed without interruption?

You are now ready for the next chapter. next chapter.

Video