This week, you’ll run your SQL query in Python or R, pulling results directly into a data frame and turning them into a visualization.
In Python, use duckdb + a cursor to query your database, convert to a pandas DataFrame, and plot
In R, use dbplyr to write dplyr code that gets translated to SQL automatically
Both paths end the same way: a query result becomes a plot
Python: Connect to Your Database
import duckdbimport pandas as pdconn = duckdb.connect("database.duckdb")cur = conn.cursor()
Python: Run a Query
cur.execute(""" SELECT Species, COUNT(*) AS num_nests FROM Bird_nests GROUP BY Species ORDER BY num_nests DESC""")
At this point the data is not yet in Python memory. It’s a symbolic representation of the query,but you need to materialize it.
Python: Getting Results Out
Several options for materializing results:
cur.fetchall() # list of tuples — all rows at oncecur.fetchone() # one row as a tuplecur.df() # pandas DataFrame ← best for analysis
To get a DataFrame you can plot:
cur.execute(""" SELECT Species, COUNT(*) AS num_nests FROM Bird_nests GROUP BY Species ORDER BY num_nests DESC""")df = cur.df()df.head()
Python: Turn the Result into a Plot
import matplotlib.pyplot as pltfig, ax = plt.subplots(figsize=(8, 5))ax.barh(df["Species"], df["num_nests"])ax.set_xlabel("Number of Nests")ax.set_title("Nest Count by Species")plt.tight_layout()plt.show()
Because cur.df() returned a standard pandas DataFrame, any plotting library works — matplotlib, seaborn, plotly, etc.
Python: Close the Connection
conn.close()
Cursors don’t store anything, they just transfer queries and return results. Once you’ve called .df() or .fetchall(), the data lives in your variable, not the cursor.
# See what SQL dbplyr generated:nests_db |>group_by(Species) |>summarise(num_nests =n()) |>show_query()
R: Pulling Data into R with collect()
result <- nests_db |>group_by(Species) |>summarise(num_nests =n()) |>arrange(desc(num_nests)) |>collect() # ← runs the query, returns a true data frame
Nothing runs on the database until collect() is called. Everything before it is building up the SQL query.
R: Turn the Result into a Plot
result |>ggplot(aes(x = num_nests,y =reorder(Species, num_nests))) +geom_col(fill ="#52796f") +labs(x ="Number of Nests",y ="Species",title ="Nest Count by Species" ) +theme_minimal()
R: Close the Connection
DBI::dbDisconnect(conn, shutdown =TRUE)
The shutdown = TRUE argument is specific to DuckDB. It cleanly shuts down the DuckDB engine when you disconnect.
This Week’s Task
Using Python or R, connect to your database, run your analytical query, and create a visualization from the results.