RECAP - open SQLite on database file - .table, .schema to find out what's inside - SELECT DISTINCT Location FROM Site ORDER BY Location LIMIT 3; - write query in file, use .read to execute it - good idea to capture schema in SQL file to recreate database, for documentation FILTERING - SELECT * FROM Site WHERE Area < 200; - ...AND Latitude > 60, etc. - not equals: classic is <>, but everybody supports != nowadays - string matching -- double up single quotes to create a single quote: 'I don''t like this' -- LIKE '...', % for wildcard -- string matching might be case (in)sensitive, check database settings -- regexp usually supported via functions EXPRESSIONS - SELECT Site_name, Area*2.47 FROM Site; - expr AS name to give name to computed column - look at SQLite function list, lots of 'em - || to concatenate strings AGGREGATION & GROUPING - SELECT COUNT(*) FROM Species; - number of rows - SELECT COUNT(Scientific_name) FROM Species; - number of non-NULL values - SELECT COUNT(DISTINCT Relevance) FROM Species; - same as SELECT DISTINCT, but see how many rows are returned - SELECT AVG(Area) FROM Site; - MIN, MAX, etc. - careful: SELECT Location, AVG(Area) FROM Site; -- !! picks arbitrary value for location - enter grouping - SQL will return one row per group - SELECT Location, MAX(Area) FROM Site GROUP BY Location; - WHERE to filter rows going into grouping operation - HAVING to filter rows coming out of grouping operation - SELECT Location, MAX(Area) AS Max_area FROM Site GROUP BY Location HAVING Max_area > 200 ORDER BY Max_area DESC; RELATIONAL ALGEBRA - result of every operation is another table - even SELECT COUNT(*) returns a little baby table - CREATE TEMP TABLE AS SELECT... to save query - can nest queries -- SELECT ... FROM (SELECT ...) -- another place: WHERE column IN (SELECT ...) -- there are others NULL PROCESSING - NULL means unknown - SELECT COUNT(*) FROM Bird_nests WHERE floatAge > 5; - or <= 5 - still missing rows, because a NULL value is neither > 5 nor <= 5 - must use IS NULL or IS NOT NULL - aggregate functions omit NULLs like R does JOINS, THE HEART AND SOUL OF RDBMS - SELECT * FROM Camp_assignment; - but we want full names, not abbreviations - so follow linkage shown in ER diagram - SELECT * FROM Camp_assignment, Personnel ON Camp_assignment.Observer = Personnel.Abbreviation LIMIT 10; - or use JOIN instead of bare , - or INNER JOIN to be verbose - note qualifications by table name; unneeded if column names are unambiguous (but duplicate column names often occur) - mega-rows: get all columns from both tables - (database doesn't actually do that unless required, is more efficient) - more commonly, ask for selected columns - SELECT Year, Site, Name, Start, End FROM Camp_assignment, Personnel ON Camp_assignment.Observer = Personnel.Abbreviation LIMIT 10; - use table abbreviations for conciseness - SELECT Year, Site, Name, Start, End FROM Camp_assignment CA, Personnel P ON CA.Observer = P.Abbreviation LIMIT 10; - can also use AS for readability - 3 way join: - SELECT * FROM Camp_assignment CA, Personnel P, Site S ON CA.Observer = P.Abbreviation AND CA.Site = S.Code LIMIT 3; - can add WHERE clauses, etc.: WHERE CA.Observer = 'lmckinnon' - notice denormalization that is happening - which is the idea: normalize for data organization/compactness - temporarily denormalize to relate data back together - another example: how many bird eggs are in each nest? - can get away with USING since same column name - SELECT Nest_id, COUNT(*) FROM Bird_nests, Bird_eggs USING (NEST_Id) GROUP BY Nest_id; - let's add in some nicer species names; need to add Species to the join - switch to ON... instead of USING, also need to qualify column names - SELECT BN.Nest_id, S.Scientific_name, COUNT(*) AS Egg_count FROM Bird_nests BN, Bird_eggs BE, Species S ON BE.Nest_id = BN.Nest_id AND BN.Species = S.Code GROUP BY BN.Nest_id; - save via CREATE TEMP TABLE mytable AS SELECT...