Week 2 - Analyzing & cleaning the bird dataset (from csv)

Cleaning the raw data

The shorebird dataset, compiled over many years by diverse researchers, is likely to contain data quality inconsistencies. Therefore, prior to importing the CSV files into our database, we must perform data cleaning to ensure:

  1. The data is in a long format.
  2. The data is normalized.
  3. No information is lost during the import due to low-quality data that violates the database constraints.

And in any case, the garbage in, garbage out motto often use in machine learning applies here as well!

Here is the repository where we are going to practice our data wranglers skills, PLEASE CREATE A FORK and clone your fork to your machine:

https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-data-cleaning

Analyzing the data from the csv files

Now that we have cleaned some of the tables, let’s try to conduct some data analyses to start exploring the data set. PLEASE CREATE A FORK and clone your fork to your machine:

https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-data-analysis


This work is licensed under CC BY 4.0

UCSB logo