Example of Jupyter “magic command”:

%pwd

To install DuckDB Python module:

# %pip install duckdb
# %pip install pandas
Requirement already satisfied: duckdb in /Users/jb160-local/GitHub/gitRDS/EDS213/bren-eds213/.venv/lib/python3.14/site-packages (1.5.2)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: pandas in /Users/jb160-local/GitHub/gitRDS/EDS213/bren-eds213/.venv/lib/python3.14/site-packages (3.0.2)
Requirement already satisfied: numpy>=2.3.3 in /Users/jb160-local/GitHub/gitRDS/EDS213/bren-eds213/.venv/lib/python3.14/site-packages (from pandas) (2.4.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/jb160-local/GitHub/gitRDS/EDS213/bren-eds213/.venv/lib/python3.14/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /Users/jb160-local/GitHub/gitRDS/EDS213/bren-eds213/.venv/lib/python3.14/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Note: you may need to restart the kernel to use updated packages.

import duckdb
import pandas as pd

1. Create a connection to the database

conn = duckdb.connect("~/GitHub/gitRDS/EDS213/bren-eds213-data/database/database.duckdb")
conn

2. Now let’s query the databse using:

  1. .sql() method – specific to DuckDB and analysis focused
conn.sql("SELECT * FROM Site LIMIT 5")

Note: This is also a lazy evaluation like we were doing with dbplyr. At this point, the data has not been fully processed or brought into Python memory yet. It’s a symbolic representation of a query. To see the actual data, you need to materialize the relation. You can do this in several ways:

.show(): Prints a nice tabular representation (great for interactive exploration). .fetchall(): Returns all results as a list of tuples (the traditional DB-API way). .fetchone(): Returns the next single row as a tuple. .df(): Converts the result into a Pandas DataFrame.Now we want results…

# You get a list of tuples (one per row)
conn.sql("SELECT * FROM Site LIMIT 5").fetchall()
# You get a pandas dataframe
site_df = conn.sql("SELECT * FROM Site").df()
site_df.head()
site_table = conn.execute("SELECT * FROM Site")

site_table
  1. .execute() method – more ubiquitous to other database workflows where you create a cursor object that you use to iterate on the rows of a table
cur = conn.cursor()

Cursors don’t store anything, they just transfer queries to the database and get results back.

cur.execute("SELECT * FROM Site LIMIT 5")

We still need to fectch the results as before:

cur.fetchall()

Always get tuples, even if you only request one column

cur.execute("SELECT Nest_ID FROM Bird_nests LIMIT 10")
cur.fetchall()
cur.execute("SELECT Nest_ID FROM Bird_nests LIMIT 10")
[t[0] for t in cur.fetchall()]
  1. Get the one result
cur.execute("SELECT COUNT(*) FROM Bird_nests")
cur.fetchall()
cur.execute("SELECT COUNT(*) FROM Bird_nests")
cur.fetchone()
cur.execute("SELECT COUNT(*) FROM Bird_nests")
cur.fetchone()[0]
  1. Using an iterator - but DuckDB doesn’t support iterators :(
cur.execute("SELECT Nest_ID FROM Bird_nests LIMIT 10")
for row in cur:
    print(f"got {row[0]}")

A workaround is to use the .fectchone() method:

cur.execute("SELECT Nest_ID FROM Bird_nests LIMIT 10")
while True:
    row = cur.fetchone()
    if row == None:
        break
    # do something with row
    print(f"got nest ID {row[0]}")

Can do things other than SELECT!

cur.execute("CREATE TEMP TABLE temp_table AS
            SELECT * FROM Bird_nests LIMIT 10")

Watch out for line breaks!!

cur.execute("""
    CREATE TEMP TABLE temp_table AS
    SELECT * FROM Bird_nests LIMIT 10
    """)
cur.execute("SELECT * FROM temp_table")
cur.fetchone()

A note on fragility

For example:

INSERT INTO Site VALUES ("abcd", "Foo", 35.7, 42.3, "?")

A less fragile way of expressing the same thing:

INSERT INTO Site (Code, Site_name, Latitude, Longitude, Something_else)
   VALUES ("abcd", "Foo", 35.7, 42.3, "?")

In the same vein: SELECT * is fragile

cur.execute("SELECT * FROM Site LIMIT 3")
cur.fetchall()

A better, more robust way of coding the same thing:

cur.execute("SELECT Site_name, Code, Latitude, Longitude FROM Site LIMIT 3")
cur.fetchall()

An extended example: Question we’re trying to answer: How many nests do we have for each species?

Approach: first get all species. Then execute a count query for each species.

A digression: string interpolation in Python

# The % method
s = "My name is %s"
print(s % "Greg")
s = "My name is %s and the other teacher's name is %s"
print(s % ("Greg", "Julien"))
# The new f-string method
name = "Greg"
print(f"My name is {name}")
# Third way
print("My name is {}".format("Greg"))
query = """
   SELECT COUNT(*) FROM Bird_nests
   WHERE Species = '%s'
"""
cur.execute("SELECT Code FROM Species LIMIT 3")
for row in cur.fetchall():  # DuckDB workaround
    code = row[0]
    prepared_query = query % code
    #print(prepared_query)
    cur2 = conn.cursor()
    cur2.execute(prepared_query)
    print(f"Species {code} has {cur2.fetchone()[0]} nests")
    cur2.close()

The above Python interpolation is dangerous and has caused many database hacks! There’s a better way

query = """
   SELECT COUNT(*) FROM Bird_nests
   WHERE Species = ?
"""
cur.execute("SELECT Code FROM Species LIMIT 3")
for row in cur.fetchall():  # DuckDB workaround
    code = row[0]
    # NOT NEEDED! prepared_query = query % code
    #print(prepared_query)
    cur2 = conn.cursor()
    cur2.execute(query, [code])  # <-- added argument here
    print(f"Species {code} has {cur2.fetchone()[0]} nests")
    cur2.close()

Let’s illustrate the danger with a different example

abbrev = "TS"
name = "Taylor Swift"
cur.execute("""
   INSERT INTO Personnel (Abbreviation, Name)
   VALUES ('%s', '%s')
   """ % (abbrev, name)
           )
cur.execute("SELECT * FROM Personnel")
cur.fetchall()[-3:]
abbrev = "CO"
name = "Conan O'Brien"
cur.execute("""
   INSERT INTO Personnel (Abbreviation, Name)
   VALUES ('%s', '%s')
   """ % (abbrev, name)
           )
"""
   INSERT INTO Personnel (Abbreviation, Name)
   VALUES ('%s', '%s')
   """ % (abbrev, name)
abbrev = "CO"
name = "Conan O'Brien"
cur.execute("""
   INSERT INTO Personnel (Abbreviation, Name)
   VALUES (?, ?)
   """,
    [abbrev, name])
cur.execute("SELECT * FROM Personnel")
cur.fetchall()[-3:]

Don’t forget to close your connection when you are done!

conn.close()


This work is licensed under CC BY 4.0

UCSB logo