Week 2 - Data Cleaning
Please use Canvas to return the assignments: https://ucsb.instructure.com/courses/26293/assignments/357220
We cleaned the Snow_cover
column during class. Inspiring yourself from the steps we followed, do the following in a quarto document:
Clean the
Water_cover
column to transform it into the correct data type and respect expectations for a percentageClean the
Land_cover
column to transform it into the correct data type and respect expectations for a percentageUse the relationship between the three cover columns (Snow, Water, Land) to infer missing values where possible and recompute the
Total_cover
column as needed
Add comments to your quarto document about your decisions and assumptions, this will be a large part of the grading (see below).
Setup
You should start a new quarto document named eds213_data_cleaning_assign_YOURNAME.qmd
on your fork of the GitHub repository bren-meds213-data-cleaning. Don’t forget to use the metadata present in the data folder
The expectations are:
- The Quarto document
eds213_data_cleaning_assign_YOURNAME.qmd
should run if your repo is cloned locally (5 pts; Reproducibility) - The code should output a csv file named
all_cover_fixed_YOURNAME.csv
in thedata/processed
folder (5 pts; Code) - Comment your code well (10 pts; Documentation)
- Your quarto document should provide all the necessary explanations about your decisions and discuss any assumptions you made (don’t forget to look at the metadata!) (30 pts; Documentation)
- The code should perform the 3 data cleaning steps describe above to enable ingestion into a database (50 pts; Technical concepts)
Note: We recommend starting by importing the csv file with the corrected Snow_cover
column (data/processed/snow_cover.csv
) we generated during class (my code here)
Finally, remember there are many correct ways of doing this, thus the importance of capturing and deiscussing your decision process.
How to submit your assignment to Canvas:
- Zip your repository as a
.zip
file and upload it to canvas
Credit: 100 pts