Week 1 - Data modeling
Please use Canvas to return the assignments: https://ucsb.instructure.com/courses/32934/assignments/459589
Create a table definition for the Snow_survey table that is maximally expressive, that is, that captures as much of the semantics and characteristics of the data using SQL’s data definition language as is possible.
The Snow_survey table and some additional tables you might find useful for this exercise can be downloaded at:
To understand the Snow_survey table, you will want to look at the values that occur in the data using a tool like R, Python, or OpenRefine. To help you further, in Appendix 1 is the documentation for the table that comes with the ASDN dataset, and in Appendix 2 are SQL table definitions for the additional tables.
Your goal is to express your expectations about what the data should look like, so that the database can then enforce those expectations.
Please consider:
- the data types of columns (pick from TEXT, REAL, INTEGER, DATE for this exercise)
- if the table has a primary key and what it might be
- any foreign key(s)
- whether NULL values are allowed
- uniqueness constraints, on individual columns and across columns
- other column value constraints, again, on individual columns and across columns
Please express your table definition in SQL, but don’t worry about getting the SQL syntax exactly correct. This assignment is just a thought exercise. If you do want to try to write correct SQL, though, your may find it helpful to consult the DuckDB CREATE TABLE documentation.
Finally, please provide some explanation for why you made the choices you did, and any questions or uncertainties you have. Don’t write an essay! Bullet points are sufficient. But do please explain your thought process.
Appendix 1: Metadata
Periodic records of snow cover remaining at the site
| Column name | Definition |
|---|---|
| Site | Four-letter code of site at which data were collected |
| Year | Year in which data were collected |
| Date | Date on which data were collected |
| Plot | Name of study plot on which survey was conducted |
| Location | Name of dedicated snow-survey location, if applicable |
| Snow_cover | Percent cover of snow, including slush |
| Water_cover | Percent cover of water |
| Land_cover | Percent cover of exposed land |
| Total_cover | Total sum (to check the above percents; should always sum to 100) |
| Observer | Person who conducted the survey |
| Notes | Any relevant comments on the survey |
Appendix 2: Table definitions
Table definitions for the additional tables:
CREATE TABLE Site (
Code TEXT PRIMARY KEY,
Site_name TEXT UNIQUE NOT NULL,
Location TEXT NOT NULL,
Latitude REAL NOT NULL CHECK (Latitude BETWEEN -90 AND 90),
Longitude REAL NOT NULL CHECK (Longitude BETWEEN -180 AND 180),
"Total_Study_Plot_Area_(ha)" REAL NOT NULL
CHECK ("Total_Study_Plot_Area_(ha)" > 0),
UNIQUE (Latitude, Longitude)
);
CREATE TABLE Species (
Code TEXT PRIMARY KEY,
Common_name TEXT UNIQUE NOT NULL,
Scientific_name TEXT,
Relevance TEXT
);
CREATE TABLE Personnel (
Abbreviation TEXT PRIMARY KEY,
Name TEXT NOT NULL UNIQUE
);
Credit: 50pts
