Getting Started
What is reproducibility?
Computational reproducibility is the ability to obtain identical results from the same data with the same computer code. It is a building block for transparent and cumulative science because it enables the originator and other researchers, on other computers and later in time, to reproduce and thus understand how results came about while avoiding a variety of errors that may lead to erroneous reporting of statistical and computational results. Tools like the ‘repro’ package in R, as discussed in the text, help automate this process, making it easier for researchers to create reproducible workflows. In essence, reproducibility is the cornerstone of robust, reliable, and high-quality scientific research.
What problems must we solve?
- copy&paste mistakes
- inconsistent versions of code or data
- missing or incompatible software
- complicated or ambiguous procedure for reproduction
What concepts must we know about?
- Dynamic documents
- Version control
- Software management
- Workflow orchestration
What software must we know about?
- RMarkdown
- Git
- Docker
- Make
There are alternatives to these implementations, e.g., Quarto, Jupyter, Singularity, virtualenv, SnakeMake, and many more. However, we had to decide on one set of tools. More important is that you understand for which purpose we use them then you may replace them with whatever you favor.
What services will we use?
You usually work on your local computer. However, in workshops, this often leads to the issues of missing/outdated/broken software setups. Therefore, we will use the prepared software environment we give you in the form of Posit Cloud (= RStudio Cloud).
Additionally, we will make heavy use of GitHub (≠ Git) and GitHub Actions (∈ GitHub).
What does self-paced mean?
Self-paced learning is an educational approach that allows you to control the speed and intensity of your learning. This means you can learn at your own rhythm, pausing, reviewing, and progressing when it suits you best. You are encouraged to actively engage with the material, explore the content, and try to solve problems on your own before seeking help. In this course, there’s no rigid schedule to follow. You’re free to come-and-go and move through the content as you please. Remember, while it’s self-guided, you’re not alone - use the Infoboxes, explanation follows in the next paragraph, and course leaders are available to assist if you encounter difficulties you can’t resolve on your own. It is a flexible and personal way of learning designed to support you on your learning journey.
What are infoboxes?
There are two types of Infoboxes in this course. The first type is the Additional Infobox. These provide information to delve deeper into a topic and can be found throughout the course. Clicking on one of these boxes will take you to more in-depth material.
The second type is the ReproDudes. You will find these at least once in each main chapter. If you have a question or get stuck, click on the relevant box in the chapter, and you will be redirected to ReproDude, already informed about the corresponding topic. Ask what you want to know and have a conversation. To use this, you will need a ChatGPT account, a link to which can be found below.
Additional info
I am an additional infobox that provides you with additional material to educate you more deeply. Click on my sister boxes for more information.
ReproDude
I am ReproDude, so if you have a question or are stuck on a topic, click on the appropriate ReproDude box and ask there. You will need an account with ChatGPT to use me.
What else should I know before I start?
In this workshop, we will go through many things, many of which are complicated. The goal today is to understand what is possible and not to achieve a complete deep technical understanding. So be kind to others and yourself and try not to get frustrated, ReproDude and the workshop leaders are there to help you if necessary!
Just as you set your own pace in this course, you can also decide which tasks you do in this course. There are two types of tasks:
1. The tasks marked in green are important, and the workshop will not work if you skip any of these points or do them in the wrong order.
2. The tasks marked in blue are completely optional and are meant for trying out and experimenting. You can skip these tasks as you like.
Questions? Comments? Notes?
If you have any questions, comments, or notes regarding the workshop in general, you can open an issue on GitHub.
Setup!
If you already have an account, sign in; otherwise, sign up.
Posit/RStudio Cloud
If you have not used Posit/RStudio Cloud before:
Use Option: “Sign Up with GitHub”
If you already have a Posit/RStudio account:
ChatGPT
If you have not used ChatGPT before:
Use Option: “Sign up”
If you already have a ChatGPT account:
Use Option: “Log in”
Error Handling
In this workshop, hiccups may pop up that aren’t exactly your fault. When faced with such situations, here’s a cheat sheet to get you back on track:
- Ensure you’re in RCloud with the project open (it’ll look like your everyday R environment).
- Up top-right, your name should be visable. To the left, you’ll find your profile picture (or perhaps your initial if you’re camera shy), a trio of dots in a circle, and a gear icon.
- Click on the circle with the three dots.
- Now, click on ‘Relaunch Project’ (it’s the one at the bottom).
- A wild pop-up appears! Click ‘OK’.
- Wait for a moment…
- Give the step that produced the problem another go!
- If the problem persists, do not hesitate to reach out to us!
Feel free to reach out anytime, and remember, every coder has faced a pesky bug or two… or three hundred!
Final Step
Now, please open the project:
If you have already logged into Posit/RStudio Cloud during the previous section, you should now see a copy of the project in front of you. Click now on “Save a Permanent Copy” in the top right-hand corner of the task bar. This ensures you have access to the project even after logging off.
Congratulations! You are now ready to start the workshop; please go to the next chapter. next chapter.