Text Analysis with R
Summary and Setup
Authors: Renata Curty
and Jairo Melo 
This three-part, hands-on workshop series introduces participants to the fundamentals of extracting insights from textual data using R. In Part 1: Text Preprocessing, we focus on cleaning and preparing text for analysis through normalization, noise reduction, stopword removal, tokenization, and lemmatization. Part 2 will delve into core text analysis techniques, including word frequencies, collocations, n-grams, and visualizations such as word clouds. Finally, Part 3 will explore sentiment analysis, applying polarity scoring and emotion detection methods. Throughout the series, we’ll also highlight important caveats and best practices unique to working with textual data.
Prerequisites
These lessons are hands-on and are designed to be followed with R and RStudio open. Before starting, please ensure you have the following software installed:
- R: We recommend R version 4.3 or newer. Download from CRAN.
- RStudio: We recommend RStudio version 2023.12 or newer. Download from Posit’s website.
Check your versions
- RStudio:
- On Mac: Go to
RStudio->About RStudio. - On Windows: Go to
Help->About RStudio.
- On Mac: Go to
- R: In the R console, run:
#| eval: false
R.version.stringUpdate R
- Go to CRAN and download the latest version for your operating system.
- Run the installer. (You don’t need to uninstall older versions—R will install alongside them.)
Update RStudio
- Go to Posit’s download page.
- Download and install the newest version for your operating system.
That’s it! After updating, restart your computer to make sure RStudio finds the latest R.
Access to Data
For this lesson we will analyze a dataset of social media posts related to the Apple TV series Severance. The dataset was collected using Brandwatch (via UCSB Library subscription), and it includes posts from the two days following the finales of Season 1 (April 2022) and Season 2 (March 2025). The dataset contains over 5,800 posts stored in a CSV file.
The R project containing the dataset and other files is available for download from this link: Severance Dataset. You will need an active UCSB NetID and password to access the file (the same you use for your UCSB email).
R Skill Level
This level assumes a basic familiarity with R and RStudio. If you are new to R, we recommend you check out the Introduction to Data Analysis with R, particularly the Introduction to R and RStudio section.