EDS 213 Lab 1

Spring Quarter

Lab section overview

The goal of our weekly lab sections will be to answer an analytical question of your choosing using a database. You will create this database yourself, and then use it to answer your question. In week 7, we will have a database showcase where you get to teach your classmates about the database you created!

Schedule
Week Objective
1 Select a dataset and understand the relationships among the different tables to define a schema.
2 Clean the data.
3 Ingest the data into a database and explore your newly created database.
4 After cleaning your data and having it in a database, come up with an analytical question to answer with your database. Develop the SQL query/ queries to answer this question.
5 Write some R or Python code to create some data visualization/ analysis with your database.
6 Document your work.
7 Database showcase!

Project Rubric

Your lab project is graded Pass/Fail across four criteria:

Criteria Pass Fail
Database Schema Appropriate dataset; correct entities & relationships; complete schema diagram; proper primary/foreign keys & data types. (Submit schema to Canvas by 4/15) Dataset too simple; missing critical relationships; incomplete or incorrect schema diagram.
Code Appropriate data transformations; missing values handled; formats standardized; data integrity preserved. No evidence of cleaning; major data quality issues remain; missing values mishandled; lack of standardization.
Documentation Clear project purpose; well-documented process; commented code; decisions explained; results discussed; GitHub repo follows repository rubric. Minimal/unclear docs; omitted steps; code lacks comments; results presented without context; insufficient GitHub repo.
Reproducibility Project reproducible from docs; all files included (.sql + .ipynb/.qmd); steps clearly ordered; dependencies listed. Critical files/steps missing; process cannot be reproduced; dependencies not specified; setup requirements missing.
Presentation Covers dataset overview, analytical question, and SQL query; includes a data visualization; stays within 2.5 minutes; reflects on challenges or surprises. Missing key components (dataset context, query, or visualization); significantly over/under time; little to no reflection on the process.

GitHub Repository Rubric

Your README should include:

  • A short, descriptive title
  • Markdown headers separating each section
  • A brief explanation of the repository’s purpose
  • A concise description of repository structure / file organization
  • No hidden files other than .gitignore (no .DS_Store!)
  • Details on data access — where the data lives and how to access it
  • References & acknowledgements in a consistent format (include course, data sources, and any other resources)
  • Free of typos, grammatical errors, and formatting mistakes

Required Repository Files

Your repo must contain:

  • Your database (if applicable)
  • A .sql file with your query
  • A .qmd / .ipynb file with your data analysis & visualization
  • Your data cleaning script
  • A README
  • A .txt file listing dependencies and environment requirements for your R/Python analysis
  • Any other files necessary to reproduce your analysis

Week 7 Presentation Rubric

You’ll have 2.5 minutes to present your database to the class. Your presentation must cover:

  • Dataset overview

    • How did you find it? Was it messy? Well documented?
    • Any challenges ingesting it into a database?
  • Your question & how you answered it

    • Include your SQL query
  • The data visualization created from your query

  • Any other comments — challenges, surprises, lessons learned!