Week 4 - Reproducible and FAIR data

Learning goals

  • Understand the implications of data management and organization for reproducible and FAIR (Findable, Accessible, Interoperable, and Reusable) data science.
  • Operationalize reproducibility and the FAIR principles by adopting good and responsible data management, code, and workflow documentation practices in your daily work.
  • Apply strategies to mitigate issues that could prevent reproducibility.

Slides

slides-04.pptx

In-class exercise

Renv for R (graded)

Open the project example in the class data GitHub repository, week 4 in RStudio.

Inspect files and documentation. Take a quick look. What opportunities for improvement can you spot in this project (README, file naming, and organization)?

  • Let’s look together at the scripts. Any issues when you try to run it?
  • Create a renv.lock file for the project
  • Organize the files in a way that would make things better (optional)

Venv for Python (optional)

  • Let’s use the terminal in VS Code on the tsosie server and see how we can set up a virtual environment for Python

Binder (optional)

  • Create a project on GitHub using the data and code from the project example (give a good name to it!)
  • “Binderize” your example repo
  • Share the link to your repo

Virtual environments notes

Homework

Project organization and documentation