RECAP - make sure ORDER BY is last step -- a join creates a whole new table, order may be completely different - where is SQL used in E&ES in practice? -- within projects -- ex: NCEAS BIEN, mongo DB schema -- SQL to build, manage such a database -- ASDN: should have used RDBMS on this project -- SQL exposed to outsiders? no --- presumes SQL knowledge --- security risk --- danger of long-running queries -- SQL from R/Python usually streamlined to ask specific questions for purposes of analysis OUTER JOINS - what's happening? - value in creating experiment - CREATE TABLE a (cola INTEGER, common INTEGER); - INSERT INTO a VALUES (1, 1), (2, 2), (3, 3); - CREATE TABLE b (common INTEGER, colb INTEGER); - INSERT INTO b VALUES (2, 2), (3, 3), (4, 4), (5, 5); - OUTER JOIN b includes all rows of b and fills in a's columns with NULL - what species do *not* have nest data? - SELECT Code FROM Species WHERE Code NOT IN (SELECT DISTINCT Species FROM Bird_nests); - SELECT Code Species LEFT JOIN Bird_nests ON Species.Code = Bird_nests.Species WHERE Bird_nests.Species IS NULL; SELF JOINS - 2 copies of same table (conceptually) - used with hierarchical data - or this problem: list all pairs of people who worked together - (i.e., same site, overlapping time intervals) - SELECT A.Site, A.Observer, B.Observer FROM Camp_assignment A, Camp_assignment B ON A.Site = B.Site AND A.Observer < B.Observer WHERE A.Start <= B.End AND B.Start <= A.End ORDER BY A.Site, A.Observer; - < needed to avoid listed people working with themselves, and to avoid listing pairs twice, once for each name order VIEWS - make normalized tables more palatable - virtual, not materialized like CREATE TABLE AS - virtual implies always reflects current data in source tables - CREATE VIEW v AS SELECT Year, Site, Name, Start, End FROM Camp_assignment, Personnel ON Camp_assignment.Observer = Personnel.Abbreviation; - inserts, updates, deletes possible? maybe, depends on DB intelligence - can't insert/update on view the uses aggregate function SET OPERATIONS - UNION, INTERSECT, EXCEPT - last one is set difference - BTW: 3rd way to get species that do *not* have nest data: use EXCEPT - SELECT Code FROM Species EXCEPT SELECT DISTINCT Species FROM Bird_nests; - example: book b14 measured in inches, not millimeters, adjust on fly - SELECT Book_page, Nest_ID, Egg_num, Length*25.4, Width*25.4 FROM Bird_eggs WHERE Book_page LIKE 'b14%' UNION SELECT Book_page, Nest_ID, Egg_num, Length, Width FROM Bird_eggs WHERE Book_page NOT LIKE 'b14%'; - caution: not really looking at names of columns, just #columns and data types - so is possible to union nonsensical things together - UNION throws out duplicates, UNION ALL keeps all - UNION ALL = rbind in R, except matches on column names DATA MANAGEMENT STATEMENTS - INSERT: already seen that - UPDATE table SET ... WHERE ... - DELETE FROM table WHERE ... - caution: these operate on whole tables if unqualified - strategies to avoid catastrophes -- just subconsciouly be careful, like holding a kitchen knife -- put comment in front: -- DELETE FROM ..., then remove comment -- do SELECT first, then replace SELECT with DELETE <- allows inspection of what's about to be deleted -- tweak table name, put x in front, then remove I/O - all RDBMSs have scriptable I/O tools - generally, INSERT one at a time is slow - output to CSV -- .mode csv, .headers on, .output file -- plain .output to turn output off - loading CSVs: see build script - if table doesn't exist, is created w/ all text columns (don't skip header)