Spring Quarter
Lab section overview
The goal of our weekly lab sections will be to answer an analytical question of your choosing using a database. You will create this database yourself, and then use it to answer your question. In week 7, we will have a database showcase where you get to teach your classmates about the database you created!
| Week | Objective |
|---|---|
| 1 | Select a dataset and understand the relationships among the different tables to define a schema. |
| 2 | Clean the data. |
| 3 | Ingest the data into a database and explore your newly created database. |
| 4 | After cleaning your data and having it in a database, come up with an analytical question to answer with your database. Develop the SQL query/ queries to answer this question. |
| 5 | Write some R or Python code to create some data visualization/ analysis with your database. |
| 6 | Document your work. |
| 7 | Database showcase! |
Project Rubric
Your lab project is graded Pass/Fail across four criteria:
| Criteria | Pass | Fail |
|---|---|---|
| Database Schema | Appropriate dataset; correct entities & relationships; complete schema diagram; proper primary/foreign keys & data types. (Submit schema to Canvas by 4/15) | Dataset too simple; missing critical relationships; incomplete or incorrect schema diagram. |
| Code | Appropriate data transformations; missing values handled; formats standardized; data integrity preserved. | No evidence of cleaning; major data quality issues remain; missing values mishandled; lack of standardization. |
| Documentation | Clear project purpose; well-documented process; commented code; decisions explained; results discussed; GitHub repo follows repository rubric. | Minimal/unclear docs; omitted steps; code lacks comments; results presented without context; insufficient GitHub repo. |
| Reproducibility |
Project reproducible from docs; all files included (.sql + .ipynb/.qmd); steps clearly ordered; dependencies listed.
|
Critical files/steps missing; process cannot be reproduced; dependencies not specified; setup requirements missing. |
| Presentation | Covers dataset overview, analytical question, and SQL query; includes a data visualization; stays within 2.5 minutes; reflects on challenges or surprises. | Missing key components (dataset context, query, or visualization); significantly over/under time; little to no reflection on the process. |
GitHub Repository Rubric
Your README should include:
.gitignore (no .DS_Store!)Required Repository Files
Your repo must contain:
.sql file with your query.qmd / .ipynb file with your data analysis & visualizationREADME.txt file listing dependencies and environment requirements for your R/Python analysisWeek 7 Presentation Rubric
You’ll have 2.5 minutes to present your database to the class. Your presentation must cover:
Dataset overview
Your question & how you answered it
The data visualization created from your query
Any other comments — challenges, surprises, lessons learned!