Week 2 - Data Cleaning

Please use Canvas to return the assignments: https://ucsb.instructure.com/courses/26293/assignments/357220

We cleaned the Snow_cover column during class. Inspiring yourself from the steps we followed, do the following in a quarto document:

  1. Clean the Water_cover column to transform it into the correct data type and respect expectations for a percentage

  2. Clean the Land_cover column to transform it into the correct data type and respect expectations for a percentage

  3. Use the relationship between the three cover columns (Snow, Water, Land) to infer missing values where possible and recompute the Total_cover column as needed

Add comments to your quarto document about your decisions and assumptions, this will be a large part of the grading (see below).

Setup

You should start a new quarto document named eds213_data_cleaning_assign_YOURNAME.qmd on your fork of the GitHub repository bren-meds213-data-cleaning. Don’t forget to use the metadata present in the data folder

The expectations are:

  • The Quarto document eds213_data_cleaning_assign_YOURNAME.qmd should run if your repo is cloned locally (5 pts; Reproducibility)
  • The code should output a csv file named all_cover_fixed_YOURNAME.csv in the data/processed folder (5 pts; Code)
  • Comment your code well (10 pts; Documentation)
  • Your quarto document should provide all the necessary explanations about your decisions and discuss any assumptions you made (don’t forget to look at the metadata!) (30 pts; Documentation)
  • The code should perform the 3 data cleaning steps describe above to enable ingestion into a database (50 pts; Technical concepts)

Note: We recommend starting by importing the csv file with the corrected Snow_cover column (data/processed/snow_cover.csv) we generated during class (my code here)

Finally, remember there are many correct ways of doing this, thus the importance of capturing and deiscussing your decision process.

How to submit your assignment to Canvas:

  • Zip your repository as a .zip file and upload it to canvas

Credit: 100 pts


This work is licensed under CC BY 4.0

UCSB logo