def _join_list(x, sep="; ", keep_first_only=False):
"""
Helper function to join a list of values into a single string. If the input is not a list, it will return the string representation of the input. If the input is None, it will return an empty string.
"""
if isinstance(x, list):
if keep_first_only and len(x) > 0:
return str(x[0])
return sep.join(str(v) for v in x if v is not None)
return "" if x is None else str(x)
def _top_n(d, n=10):
"""Helper function to return the top n items from a dictionary, sorted by value in descending order."""
return dict(sorted(d.items(), key=lambda x: x[1], reverse=True)[:n])
def _redact_request_url(url):
"""Remove the api_key parameter from the URL for display purposes."""
parsed_url = urlsplit(str(url)) # Convert httpx.URL to string
query_params = parse_qsl(parsed_url.query)
filtered_params = [(name, value) for name, value in query_params if name != "api_key"]
redacted_query = urlencode(filtered_params)
redacted_url = parsed_url._replace(query=redacted_query)
return urlunsplit(redacted_url)Helping Functions
One of the most useful features of programming languages is the ability to work with functions: reusable blocks of code that perform a specific task. In Python, functions are defined using the def keyword. They can take inputs (arguments) and return outputs (values).
For this exercise, we’ll write a few small helper functions (to avoid repeating ourselves) and two main functions that query the DPLA API and return responses in a clean, “Pythonic” format.
If functions feel a little abstract right now, that’s completely fine. The main goal of the workshop is learning how to work with an API (queries, parameters, and results). You can copy/paste these functions and focus on how you use them.
Helper functions
These helper functions aren’t strictly necessary, but they keep the rest of the notebook cleaner. You can copy/paste them into your notebook and come back to the details later:
Main functions
Let’s take some time to understand our first main function. The goal is simple: make a DPLA request and get back a JSON response as a Python dictionary.
If we take a look at the DPLA API documentation, we can see that a request is built from a base URL (https://api.dp.la/v2), a resource type (items or collections), query parameters (e.g. q=artificial+intelligence), and your API key.
Additionally, you can request for specific facets (e.g. facets=sourceResource.type), specific fields to be returned (e.g. fields=sourceResource.title,sourceResource.description), and you can also specify the page number (e.g. page=2) and the number of results per page (e.g. page_size=20).
If we tried to handle all of that ad hoc every time, our notebook would get long and repetitive fast. Instead, we’ll write one function that builds the request and sends it for us.
Our function will be called search_items. It takes a query, a resource_type (usually items), and any additional DPLA parameters via **parameters.
Our resulting function will return a dictionary from the JSON response.
def search_items(query, resource_type='items', verbose=False, timeout=30.0, **parameters):
"""
Search DPLA items with given query and parameters.
Args:
query (str): The search query string. It's possible to use logical operators (AND, OR, NOT). Additionally, you can use wildcards (*) for partial matches.
resource_type (str): The type of resource to search for. Default is 'items'.
verbose (bool): If True, prints the request URL. Default is False.
timeout (float): The timeout for the HTTP request in seconds. Default is 30.0.
**parameters: Facets and filter parameters from the DPLA API documentation: https://pro.dp.la/developers/requests
Dotted keywords and values can be passed using dictionary unpacking. For example, to filter by sourceResource.title, you can pass:
**{"sourceResource.title": "example title"}
Returns:
dict: The JSON response from the DPLA API as a Python dictionary.
"""
# Build the request URL and minimal parameters
base_url = f"{API_BASE_URL}{resource_type}"
params = {
"q": query,
"api_key": os.getenv(ENV_VAR_NAME),
}
# Add additional parameters if any
for key, value in parameters.items():
params[key] = value
# Make the request
with httpx.Client(timeout=timeout) as client:
response = client.get(base_url, params=params)
if verbose:
print(f"Request URL [redacted]: {_redact_request_url(response.url)}")
response.raise_for_status()
return response.json()In a proper application, we would want to handle the exceptions that may occur during the request (e.g. network errors, invalid API key, etc.), but for the sake of this exercise we will keep it simple and just raise any exceptions that occur.
Let’s test the function with a simple query:
response = search_items(
"artificial AND intelligence",
page_size=5,
fields="sourceResource.title,sourceResource.description",
facets="sourceResource.date.begin,sourceResource.date.end",
verbose=True
)
print(json.dumps(response, indent=2))Request URL [redacted]: https://api.dp.la/v2/items?q=artificial+AND+intelligence&page_size=5&fields=sourceResource.title%2CsourceResource.description&facets=sourceResource.date.begin%2CsourceResource.date.end
{
"count": 3379,
"docs": [
{
"sourceResource.title": "Artificial Intelligence"
},
{
"sourceResource.description": [
"\"January 1986.\"",
"Caption title."
],
"sourceResource.title": "Artificial intelligence"
},
{
"sourceResource.description": [
"In scope of the U.S. Government Publishing Office Cataloging and Indexing Program (C&I) and Federal Depository Library Program (FDLP).",
"Includes bibliographical references.",
"Online resource; title from PDF cover (USMC website, viewed July 18, 2024)."
],
"sourceResource.title": "Artificial intelligence strategy"
},
{
"sourceResource.title": "Intelligence and the Computer - Artificial Intelligence"
},
{
"sourceResource.description": "Recording of the session titled \"Artificial intelligence and micros\" held at the fifth West Coast Computer Faire in San Francisco. The following papers were presented at this session: \"Microcomputers and the design of contelligent systems,\" presented by Dean Gengle. \"Artificial intelligence as applied to input and output in the office - or - making computers read and speak,\" presented by Art Derfall. \"Solving the Shooting Stars Puzzle,\" presented by Joel Shprentz.Additional Descriptive Notes: Saturday, 4:30",
"sourceResource.title": "Artificial intelligence and micros"
}
],
"facets": {
"sourceResource.date.begin": {
"_type": "date_histogram",
"entries": [
{
"count": 4,
"time": "2026"
},
{
"count": 128,
"time": "2025"
},
{
"count": 173,
"time": "2024"
},
{
"count": 100,
"time": "2023"
},
{
"count": 34,
"time": "2022"
},
{
"count": 34,
"time": "2021"
},
{
"count": 33,
"time": "2020"
},
{
"count": 23,
"time": "2019"
},
{
"count": 15,
"time": "2018"
},
{
"count": 3,
"time": "2017"
},
{
"count": 4,
"time": "2016"
},
{
"count": 6,
"time": "2015"
},
{
"count": 4,
"time": "2014"
},
{
"count": 26,
"time": "2013"
},
{
"count": 2,
"time": "2012"
},
{
"count": 1,
"time": "2011"
},
{
"count": 4,
"time": "2010"
},
{
"count": 6,
"time": "2008"
},
{
"count": 1,
"time": "2007"
},
{
"count": 31,
"time": "2006"
},
{
"count": 27,
"time": "2005"
},
{
"count": 71,
"time": "2004"
},
{
"count": 89,
"time": "2003"
},
{
"count": 67,
"time": "2002"
},
{
"count": 122,
"time": "2001"
},
{
"count": 327,
"time": "2000"
},
{
"count": 95,
"time": "1999"
},
{
"count": 171,
"time": "1998"
},
{
"count": 52,
"time": "1997"
},
{
"count": 97,
"time": "1996"
},
{
"count": 18,
"time": "1995"
},
{
"count": 22,
"time": "1994"
},
{
"count": 87,
"time": "1993"
},
{
"count": 17,
"time": "1992"
},
{
"count": 40,
"time": "1991"
},
{
"count": 8,
"time": "1990"
},
{
"count": 14,
"time": "1989"
},
{
"count": 9,
"time": "1988"
},
{
"count": 8,
"time": "1987"
},
{
"count": 30,
"time": "1986"
},
{
"count": 4,
"time": "1985"
},
{
"count": 12,
"time": "1984"
},
{
"count": 4,
"time": "1983"
},
{
"count": 13,
"time": "1982"
},
{
"count": 27,
"time": "1981"
},
{
"count": 33,
"time": "1980"
},
{
"count": 3,
"time": "1979"
},
{
"count": 5,
"time": "1978"
},
{
"count": 9,
"time": "1977"
},
{
"count": 7,
"time": "1976"
},
{
"count": 5,
"time": "1975"
},
{
"count": 10,
"time": "1974"
},
{
"count": 9,
"time": "1973"
},
{
"count": 4,
"time": "1972"
},
{
"count": 2,
"time": "1971"
},
{
"count": 19,
"time": "1970"
},
{
"count": 13,
"time": "1969"
},
{
"count": 16,
"time": "1968"
},
{
"count": 7,
"time": "1967"
},
{
"count": 30,
"time": "1966"
},
{
"count": 3,
"time": "1965"
},
{
"count": 25,
"time": "1964"
},
{
"count": 84,
"time": "1963"
},
{
"count": 62,
"time": "1962"
},
{
"count": 53,
"time": "1961"
},
{
"count": 10,
"time": "1960"
},
{
"count": 24,
"time": "1959"
},
{
"count": 13,
"time": "1958"
},
{
"count": 16,
"time": "1957"
},
{
"count": 17,
"time": "1956"
},
{
"count": 4,
"time": "1955"
},
{
"count": 16,
"time": "1954"
},
{
"count": 16,
"time": "1952"
},
{
"count": 1,
"time": "1951"
},
{
"count": 13,
"time": "1950"
},
{
"count": 2,
"time": "1949"
},
{
"count": 2,
"time": "1947"
},
{
"count": 3,
"time": "1946"
},
{
"count": 6,
"time": "1944"
},
{
"count": 3,
"time": "1942"
},
{
"count": 1,
"time": "1938"
},
{
"count": 2,
"time": "1930"
},
{
"count": 1,
"time": "1927"
},
{
"count": 1,
"time": "1926"
},
{
"count": 4,
"time": "1925"
},
{
"count": 12,
"time": "1924"
},
{
"count": 4,
"time": "1923"
},
{
"count": 1,
"time": "1914"
},
{
"count": 1,
"time": "1885"
},
{
"count": 1,
"time": "1871"
},
{
"count": 1,
"time": "1844"
}
]
},
"sourceResource.date.end": {
"_type": "date_histogram",
"entries": [
{
"count": 4,
"time": "2026"
},
{
"count": 129,
"time": "2025"
},
{
"count": 173,
"time": "2024"
},
{
"count": 99,
"time": "2023"
},
{
"count": 34,
"time": "2022"
},
{
"count": 34,
"time": "2021"
},
{
"count": 33,
"time": "2020"
},
{
"count": 23,
"time": "2019"
},
{
"count": 15,
"time": "2018"
},
{
"count": 3,
"time": "2017"
},
{
"count": 4,
"time": "2016"
},
{
"count": 6,
"time": "2015"
},
{
"count": 4,
"time": "2014"
},
{
"count": 26,
"time": "2013"
},
{
"count": 2,
"time": "2012"
},
{
"count": 1,
"time": "2011"
},
{
"count": 4,
"time": "2010"
},
{
"count": 6,
"time": "2008"
},
{
"count": 1,
"time": "2007"
},
{
"count": 31,
"time": "2006"
},
{
"count": 27,
"time": "2005"
},
{
"count": 71,
"time": "2004"
},
{
"count": 89,
"time": "2003"
},
{
"count": 67,
"time": "2002"
},
{
"count": 122,
"time": "2001"
},
{
"count": 327,
"time": "2000"
},
{
"count": 95,
"time": "1999"
},
{
"count": 171,
"time": "1998"
},
{
"count": 52,
"time": "1997"
},
{
"count": 97,
"time": "1996"
},
{
"count": 18,
"time": "1995"
},
{
"count": 22,
"time": "1994"
},
{
"count": 87,
"time": "1993"
},
{
"count": 17,
"time": "1992"
},
{
"count": 40,
"time": "1991"
},
{
"count": 8,
"time": "1990"
},
{
"count": 14,
"time": "1989"
},
{
"count": 9,
"time": "1988"
},
{
"count": 8,
"time": "1987"
},
{
"count": 30,
"time": "1986"
},
{
"count": 4,
"time": "1985"
},
{
"count": 12,
"time": "1984"
},
{
"count": 4,
"time": "1983"
},
{
"count": 13,
"time": "1982"
},
{
"count": 28,
"time": "1981"
},
{
"count": 33,
"time": "1980"
},
{
"count": 3,
"time": "1979"
},
{
"count": 5,
"time": "1978"
},
{
"count": 9,
"time": "1977"
},
{
"count": 7,
"time": "1976"
},
{
"count": 5,
"time": "1975"
},
{
"count": 10,
"time": "1974"
},
{
"count": 9,
"time": "1973"
},
{
"count": 4,
"time": "1972"
},
{
"count": 2,
"time": "1971"
},
{
"count": 19,
"time": "1970"
},
{
"count": 13,
"time": "1969"
},
{
"count": 16,
"time": "1968"
},
{
"count": 7,
"time": "1967"
},
{
"count": 30,
"time": "1966"
},
{
"count": 3,
"time": "1965"
},
{
"count": 25,
"time": "1964"
},
{
"count": 84,
"time": "1963"
},
{
"count": 62,
"time": "1962"
},
{
"count": 53,
"time": "1961"
},
{
"count": 10,
"time": "1960"
},
{
"count": 24,
"time": "1959"
},
{
"count": 13,
"time": "1958"
},
{
"count": 16,
"time": "1957"
},
{
"count": 17,
"time": "1956"
},
{
"count": 4,
"time": "1955"
},
{
"count": 16,
"time": "1954"
},
{
"count": 16,
"time": "1952"
},
{
"count": 1,
"time": "1951"
},
{
"count": 13,
"time": "1950"
},
{
"count": 1,
"time": "1949"
},
{
"count": 2,
"time": "1947"
},
{
"count": 3,
"time": "1946"
},
{
"count": 6,
"time": "1944"
},
{
"count": 3,
"time": "1942"
},
{
"count": 1,
"time": "1938"
},
{
"count": 2,
"time": "1930"
},
{
"count": 1,
"time": "1927"
},
{
"count": 1,
"time": "1926"
},
{
"count": 4,
"time": "1925"
},
{
"count": 12,
"time": "1924"
},
{
"count": 4,
"time": "1923"
},
{
"count": 1,
"time": "1914"
},
{
"count": 1,
"time": "1885"
},
{
"count": 1,
"time": "1871"
},
{
"count": 1,
"time": "1844"
}
]
}
},
"limit": 5,
"start": 1
}
Hooray 🎉! we’ve successfully queried the DPLA API and got a well-formatted response. However, this function only returns a single page of results. The maximum number of results per page is 100, so if we want more than one page, we need to handle pagination.
Our second function, search_all_items, takes almost the same parameters as search_items, but it fetches multiple pages and returns a single list of items (up to a maximum you set).
Some specific parameters for this function will allow us to control the maximum number of items to fetch (max_items), the sleep time between requests to avoid hitting rate limits (sleep), and the timeout for the HTTP requests (timeout).
def search_all_items(query, resource_type='items', max_items=100, sleep=0.5, verbose=False, timeout=30.0, **parameters):
"""
Collect up to max_items across pages.
Args:
query (str): The search query string. It's possible to use logical operators
(AND, OR, NOT). Additionally, you can use wildcards (*) for partial matches.
max_items (int): Maximum number of items to retrieve. For number of elements per page,
use the page_size parameter in **parameters.
sleep (float): Time to wait between requests to avoid hitting rate limits.
**parameters: Facets and filter parameters from the DPLA API documentation: https://pro.dp.la/developers/requests
"""
all_docs = []
page = 1
page_size = int(parameters.get("page_size", 100))
if page_size > 100:
page_size = 100
print("page_size cannot exceed 100. Setting to 100.")
while len(all_docs) < max_items:
parameters['page'] = page
data = search_items(
query,
resource_type=resource_type,
verbose=verbose,
timeout=timeout,
**parameters
)
docs = data.get('docs', [])
if not docs:
break # No more results
all_docs.extend(docs)
# stop if we've reached max_items
if len(all_docs) >= max_items:
break
page += 1
time.sleep(sleep)
return all_docs[:max_items]Again, this function can be improved with proper error handling and a better pagination logic, but we want to keep it simple for now.
Same as with the previous function, we can test it with a simple query:
results = search_all_items(
"artificial AND intelligence",
page_size=50,
max_items=150,
fields="sourceResource.title,sourceResource.description",
facets="sourceResource.date.begin,sourceResource.date.end",
)
print(f"Total items retrieved: {len(results)}")
print(json.dumps(results[-5:], indent=2)) # Print the last 5 resultsTotal items retrieved: 150
[
{
"sourceResource.description": [
"\"May 2023.\"",
"Includes bibliographical references (pages 40-47).",
"\"A report by the Select Committee on Artificial Intelligence of the National Science and Technology Council.\"",
"Description based on online resource; title from PDF cover (White House, viewed Nov. 7, 2023)."
],
"sourceResource.title": "The national artificial intelligence research and development strategic plan: 2023 update"
},
{
"sourceResource.description": [
"Updated irregularly",
"The CRS web page provides access to all versions published since 2018 in accordance with P.L. 115-141.",
"In scope of the U.S. Government Publishing Office Cataloging and Indexing Program (C&I) and Federal Depository Library Program (FDLP).",
"Report includes bibliographical references.",
"Contents viewed on December 7, 2023; title from CRS web page."
],
"sourceResource.title": "Artificial intelligence : overview, recent advances, and considerations for the 118th Congress"
},
{
"sourceResource.description": "Bibliography: leaves [107]-[108]",
"sourceResource.title": "The meaning and mechanics of intelligence"
},
{
"sourceResource.description": [
"Updated irregularly",
"Batch processed record.",
"This record provides access to the versions of this bill or resolution published in the United States Government Publishing Office's (GPO) GovInfo system. To see the versions, use the GPO PURL. To see the actions related to this bill or resolution, use the Congress.gov URL.",
"At head of title: 118th Congress, 2d session.",
"Sponsor(s): Representative Lisa Blunt Rochester.",
"Cosponsor(s): Representative Marcus J. Molinaro.",
"Referred committee(s): House Committee on Energy and Commerce.",
"Date of introduction: \"September 19, 2024.\"",
"United States Congress chamber(s): United States House of Representatives.",
"United States federal government branch: Legislative branch.",
"Access ID (GovInfo): BILLS-118hr9673ih.",
"Pagination at time of introduction: 16 pages.",
"In scope of the U.S. Government Publishing Office Cataloging and Indexing Program (C&I) and Federal Depository Library Program (FDLP).",
"GovInfo.gov metadata; title from caption of the introduced version (GovInfo, September 19, 2024)."
],
"sourceResource.title": "To Direct the Secretary of Commerce to Develop a National Strategy Regarding Artificial Intelligence Consumer Literacy and Conduct a National Artificial Intelligence Consumer Literacy Campaign"
},
{
"sourceResource.description": "Deep learning has rapidly emerged as a transformative technology that permeates all modern software, from autonomous driving systems to medical-diagnosis and malware-detection tools. Considering the critical role of this software in our technologies, it must behave as intended. The complexity introduced by deep-learning components complicates formal reasoning about the behavior of such software, frequently resulting in solutions that offer only empirical or no guarantees.This thesis contributes techniques and algorithms that increase the robustness of deep-learning-powered software by providing strong provable guarantees across the components existing in the entire deep learning pipeline. By leveraging the power of abstract interpretation, a well-established theory for program analysis and verification, this thesis enables the verification of robustness properties across the deep learning pipeline. The thesis focuses on four critical aspects of robustness: (1) preventing numerical bugs",
"sourceResource.title": "Towards Robust Artificial-Intelligence-Powered Software: Provable Guarantees via Abstract Interpretation"
}
]
Great! We have successfully retrieved multiple pages of results from the DPLA API, and we can control the number of items we want to fetch with the max_items parameter. Now we are ready to do some queries and explore the data!
