Reproducible research with mizer, GitHub, RPubs and binder

Share your code in 5 easy steps, so that others can easily reproduce your results and build on your work.

Gustav Delius true
08-14-2021

Introduction

There is much benefit in sharing the code that allows others to reproduce your research. Not only does it allow others to validate your results, it also allow them to build on what you have already done. This has benefits for them, but also for you because it increases engagement with your research and advances the entire field.

The fact that you are using mizer rather than coding everything from scratch of course makes your research code much shorter and therefore much easier for others to understand and extend.

In this blog post I will share my amazement at how easy it has become to share your code. There are five easy steps:

  1. Put your code into an R notebook and add a bit of explanation.

  2. Make your R notebook together with any data files publicly available in a GitHub repository.

  3. Publish your R notebook on RPubs so people can quickly view it, nicely rendered.

  4. Launch it on mybinder.org so that people can run your code in their browser without having to install anything.

  5. Let the world know.

Steps 2, 3, 4 and 5 will only take a couple of minutes each, if one knows how to, so after reading this blog post there will be no excuse not to take those steps. Obviously step 1 can take as long as you like, depending on how nicely you like your code to be documented.

I’ll talk about each step now. As an example I’ll use code for reproducing the results and figures of (Canales, Delius, and Law 2020)

Put your code into an R notebook

You probably have a jumble of R script files with the code for setting up your model and running your analysis. Now that you are done, it makes sense to organise this code a bit. A nice way of doing that is to open a new R notebook (the second entry on the “New File” menu in RStudio) and pasting your code into R chunks (you create R chunks with Ctrl-Alt-I). Then in between the R chunks you can put explanations of what the code is for. The result will look something like my example

In the example I modified the header to include a table of contents and to limit the height of the figures:

---
title: "Regulation of fish stocks without ..."
output: 
  html_notebook:
    toc: true
fig_height: 4
---

You will want to load all the libraries you need in a setup chunk like

knitr::opts_chunk$set(echo = TRUE)
library(mizer)
library(magrittr)
library(plotly)

The first line ensures that your code in the R chunks will be included when the notebook is typeset.

The explanatory text between the R chunks is using Markdown syntax. So for example you create section headers with #, headers of subsections with ## and so on. You can include LaTeX equations with the usual syntax. So \(x^2\) produces the inline expression \(x^2\) and \[x^2\] produces the displayed equation \[x^2\] For more information see the R Markdown Reference Guide and the R Markdown Cheat Sheet, both of which are accessible via ’Help -> Cheat Sheets` in RStudio.

If you have a rather large chunk of code, it may be a good idea to split it into smaller chunks, with more explanation in between. Just put your cursor at the place where you want to split the chunk and hit ‘Ctrl-Alt-I.’

The advice is to not be too perfectionist. Just put in enough explanations so that you yourself will still be able to remember in a year’s time what you were doing. You can always add more explanations for others later if there are requests. The point is that just by having your code out there, people will be more encouraged to engage with you if they are interested.

Put your notebook on GitHub

I think it is fair to say that GitHub has become the most popular place to share research code. So if you do not have an account there yet, sign up for free. Then create a new repository there for sharing your notebook.

If you have not used Git and GitHub before, it will really pay off in the long run for you to put in some time familiarising yourself with them and setting things up nicely. For a short introduction I recommend the chapter on Git and GitHub in the “R packages” book. Even though the book is about developing R packages, that chapter is not restricted to that use case. For a longer introduction, I recommend Happy Git and GitHub for the useR.

But if now you are in a hurry, you can also cheat and simply upload your R notebook and your data files using the “Add file” button on your new GitHub repository.

Publish your R notebook on RPubs

Now that your notebook is on GitHub, it is accessible to others, but GitHub does not display the typeset version of the notebook. For that you can use RPubs, which is a free hosting site for R notebooks.

RStudio makes publishing on RPubs very easy: When you click on “Preview,” RStudio will open a new window with the preview of your rendered notebook, and on that window at the top right there is a “Publish” button. Click that button and then make sure to choose “RPubs” (rather than RStudio Connect). You’ll be guided through the process.

Doing that with my example notebook leads to this. Note the “Code” button at the top right of the notebook on RPubs. It allows people to conveniently download your code.

Make it executable on mybinder.org

Now this step I think is amazing. It allows people to play with your code without having to install anything. Take a look at what this looks like for my example. Be a bit patient — after a few seconds you will see RStudio open in your browser. Click on “plankton-anchovy.nb.html” in the File pane and select “Open in Editor.” You will now be able to execute the code chunks as well as modify them at will. In fact, you can do anything that you can do in your local RStudio.

Screenshot

You have to only do three things to make this magic possible:

  1. Add a file to you repository called “install.R” which contains only an install.packages() command for installing all the packages that your notebook needs. See my example. You will want to include at least
install.packages(c("mizer", "knitr", "rmarkdown"))

but extend the list with any other packages that you load with library() in your notebook. You can create this file straight on GitHub via the “Add file” button or you can do it locally and then push to GitHub.

  1. Add a file to your repository called “runtime.txt” with yesterday’s date in the format r-yyyy-mm-dd. See this example This will have the effect of setting up the environment with the current version of all packages. That means that if the packages change in the future, this will not break your notebook. The reason I recommend using yesterday’s date rather than today’s is that this feature uses the MRAN snap shots and the snapshot for today may not yet be available.

  2. Create the URL for your binder. It will have the form https://mybinder.org/v2/gh/your-github-username/your-repository-name/HEAD?urlpath=rstudio where you need to replace your-github-username/your-repository-name with your GitHub user name and repository name. For my example the URL is (https://mybinder.org/v2/gh/sizespectrum/plankton-anchovy/HEAD?urlpath=rstudio).

The first time you visit your binder URL, mybinder will take a long time to create a Docker image. When it is done, you will see an RStudio session running in your browser, with the files from your GitHub repository available in the Files pane. When people visit the URL after you they will not have to wait so long because mybinder.org will be able to use the Docker image to start the server more quickly.

Each time you make a change to your GitHub repository, mybinder will rebuild the Docker image the next time someone visits the URL. To save the first visitor from a long wait, you may want to visit the URL yourself each time you push a change to your GitHub repository.

Let the world know

You will probably want to put the URLs to your notebook on RPubs and to your binder into the README.md file of your GitHub repository. You will want to put the link to your GitHub repository into your published paper.

And then you will want to let your social networks know. If you twitter about it, include @mizer_model in your post. Also, consider writing a summary of your work for this blog.

Last but not least, please email . We’ll include your publication in the list of publications using mizer.

Canales, T. Mariella, Gustav W. Delius, and Richard Law. 2020. “Regulation of Fish Stocks Without Stock–Recruitment Relationships: The Case of Small Pelagic Fish.” Fish and Fisheries 21 (5): 857–71. https://doi.org/10.1111/faf.12465.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/sizespectrum/MizerBlog/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Delius (2021, Aug. 14). mizer blog: Reproducible research with mizer, GitHub, RPubs and binder. Retrieved from https://blog.mizer.sizespectrum.org/posts/2021-08-14-reproducible-research-with-mizer-and-binder/

BibTeX citation

@misc{delius2021reproducible,
  author = {Delius, Gustav},
  title = {mizer blog: Reproducible research with mizer, GitHub, RPubs and binder},
  url = {https://blog.mizer.sizespectrum.org/posts/2021-08-14-reproducible-research-with-mizer-and-binder/},
  year = {2021}
}