Week 9 - Sensitive data
Learning goals
- Understand the importance of protecting sensitive data and ensuring privacy and confidentiality.
- Identify and evaluate established approaches and techniques for de-identifying and anonymizing data to mitigate the risk of re-identification.
- Apply the acquired techniques while utilizing an R package to quantify the information loss and utility.
Student notes
Before class, install the sdcMicro package if you choose not to use the servers.
If you are testing using sensitive data, make sure to launch it from RStudio, not from the website.
Slides and other materials
Demo: South Park Case files - update your fork!
Resources
sdcMicro Documentation: https://sdcpractice.readthedocs.io/en/latest/intro.html
sdcMicro Shiny app: https://sdcappdocs.readthedocs.io/en/latest/introsdcApp.html
Other useful links can be found on slides.
Suggested readings
Bledsoe, E. K., Burant, J. B., Higino, G. T., Roche, D. G., Binning, S. A., Finlay, K., … & Srivastava, D. S. (2022). Data rescue: saving environmental data from extinction. Proceedings of the Royal Society B, 289(1979), https://doi.org/10.1098/rspb.2022.0938
Bourgault, B., Tremblay, H.; Schloss, I.R.; Plante, S. & Archambault, P. (2017). “Commercially Sensitive” Environmental Data: A Case Study of Oil Seep Claims for the Old Harry Prospect in the Gulf of St. Lawrence, Canada. Case Studies in the Environment. https://doi.org/10.1525/cse.2017.sc.454841
Gehrke, J., Kifer, D., Machanavajjhala, A. (2011). ℓ-Diversity. In: van Tilborg, H.C.A., Jajodia, S. (eds) Encyclopedia of Cryptography and Security. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5906-5_899
Samarati, P., & Sweeney, L. (1998). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. https://dataprivacylab.org/dataprivacy/projects/kanonymity/paper3.pdf
In-class exercise (Day 2)
Instructions for the Whale Entanglement Exercise