eds213 – week1

EDS 213 Lab 1

Spring Quarter

Lab section overview

The goal of our weekly lab sections will be to answer an analytical question of your choosing using a database. You will create this database yourself, and then use it to answer your question. In week 7, we will have a database showcase where you get to teach your classmates about the database you created!

Schedule

Week	Objective
1	Select a dataset and understand the relationships among the different tables to define a schema.
2	Clean the data.
3	Ingest the data into a database and explore your newly created database.
4	After cleaning your data and having it in a database, come up with an analytical question to answer with your database. Develop the SQL query/ queries to answer this question.
5	Write some R or Python code to create some data visualization/ analysis with your database.
6	Document your work.
7	Database showcase!

Project Rubric

Your lab project is graded Pass/Fail across four criteria:

Criteria	Pass	Fail
Database Schema	Appropriate dataset; correct entities & relationships; complete schema diagram; proper primary/foreign keys & data types. (Submit schema to Canvas by 4/15)	Dataset too simple; missing critical relationships; incomplete or incorrect schema diagram.
Code	Appropriate data transformations; missing values handled; formats standardized; data integrity preserved.	No evidence of cleaning; major data quality issues remain; missing values mishandled; lack of standardization.
Documentation	Clear project purpose; well-documented process; commented code; decisions explained; results discussed; GitHub repo follows repository rubric.	Minimal/unclear docs; omitted steps; code lacks comments; results presented without context; insufficient GitHub repo.
Reproducibility	Project reproducible from docs; all files included (`.sql` + `.ipynb`/`.qmd`); steps clearly ordered; dependencies listed.	Critical files/steps missing; process cannot be reproduced; dependencies not specified; setup requirements missing.
Presentation	Covers dataset overview, analytical question, and SQL query; includes a data visualization; stays within 2.5 minutes; reflects on challenges or surprises.	Missing key components (dataset context, query, or visualization); significantly over/under time; little to no reflection on the process.

GitHub Repository Rubric

Your README should include:

A short, descriptive title
Markdown headers separating each section
A brief explanation of the repository’s purpose
A concise description of repository structure / file organization
No hidden files other than .gitignore (no .DS_Store!)
Details on data access — where the data lives and how to access it
References & acknowledgements in a consistent format (include course, data sources, and any other resources)
Free of typos, grammatical errors, and formatting mistakes

Required Repository Files

Your repo must contain:

Your database (if applicable)
A .sql file with your query
A .qmd / .ipynb file with your data analysis & visualization
Your data cleaning script
A README
A .txt file listing dependencies and environment requirements for your R/Python analysis
Any other files necessary to reproduce your analysis

Week 7 Presentation Rubric

You’ll have 2.5 minutes to present your database to the class. Your presentation must cover:

Dataset overview
- How did you find it? Was it messy? Well documented?
- Any challenges ingesting it into a database?
Your question & how you answered it
- Include your SQL query
The data visualization created from your query
Any other comments — challenges, surprises, lessons learned!