Week 2 - Analyzing & cleaning the bird dataset (from csv)
Cleaning the raw data
The shorebird dataset, compiled over many years by diverse researchers, is likely to contain data quality inconsistencies. Therefore, prior to importing the CSV files into our database, we must perform data cleaning to ensure:
- The data is in a long format.
- The data is normalized.
- No information is lost during the import due to low-quality data that violates the database constraints.
And in any case, the garbage in, garbage out motto often use in machine learning applies here as well!
Here is the repository where we are going to practice our data wranglers skills, PLEASE CREATE A FORK and clone your fork to your machine:
https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-data-cleaning
Analyzing the data from the csv files
Now that we have cleaned some of the tables, let’s try to conduct some data analyses to start exploring the data set. PLEASE CREATE A FORK and clone your fork to your machine:
https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-data-analysis
