EDS 213: Lab 5


Documenting Your Work

This Week’s Goal

You’ve found some datasets, cleaned your data, built a database, written queries, and started analyzing your data.Now make sure someone else (or future you) can understand and reproduce it.

  • This week you will document your project so it meets the course rubric
  • Documentation covers two things: your code and your repository
  • Remember: Others should be able to reproduce your results without asking you anything!

Comment Your Code

Every script and notebook should be self-explanatory. Add comments that explain why, not just what:

# Drop rows where species_id is NULL —
# these represent failed sensor readings, not real observations
df = df.dropna(subset=["species_id"])
-- Aggregate by habitat type to compare total observations
-- across ecosystem categories rather than individual sites
SELECT habitat_type, SUM(count) AS total
FROM observations o
JOIN sites s ON o.site_id = s.site_id
GROUP BY habitat_type;

A future reader should understand every non-obvious decision without asking you.

Make It Reproducible

Someone should be able to clone your repo and recreate your results (with the exception of needing to download data, but instructions should this should be extremely clear!).That means:

  • All necessary files are included or clearly referenced
  • Your .sql file , cleaning script, and visualization script (both .ipynb or .qmd) are present
  • Dependencies are listed so the environment can be recreated

The Dependencies File

Include a .txt file in your repo that lists your environment requirements.

For Python, export from conda or pip:

# conda
conda env export > environment.yml

# pip
pip freeze > requirements.txt

For R, document your package versions:

# run in your R script or notebook
sessionInfo()
# or for a clean list:
writeLines(capture.output(sessionInfo()), "requirements.txt")

Name it requirements.txt or environment.yml.

Repository README

The README is the first thing anyone sees when they visit your repository. It should answer: “What is this, and how do I use it?”

Required sections:

  • Short, descriptive title
  • Purpose — what the project is about
  • Repository structure — what files exist and what they do
  • Data access — where the data lives and how to get it
  • References & acknowledgements
  • Any other info that is necessary to reproduce the analysis

README: What Each Section Should Cover

Section What to include
Title Short and descriptive
Purpose Brief explanation of the repo’s goal (paragraphs or bullets)
Repository structure File organization — what’s in each folder/file
Data access Where data lives, how to access it to run the code
References Course, datasets, and any other sources — consistent format with links

No hidden files (especially no .DS_Store) other than .gitignore.

This Week’s Task

Complete the documentation for your project.

In your code:

In your repository: