Accession Number:



A Responsible De-Identification Of The Real Data Corpus: Building A Framework For PII Management

Personal Author(s):

Corporate Author:

Naval Postgraduate School Monterey United States

Report Date:



De-identification methods have helped government organizations provide the public with useful informationpromoting transparency and accountability while also protecting the individual privacy of the data subjects. However, due to the recent massive increase in data collection and improved methods of analysis, de-identification has become a more difficult task. This work outlines challenges and discusses procedures for making a potentially sensitive data set available to extramural researchers and institutions without significant risk to human subject privacy. We provide a detailed explanation of personally identifiable information to help us understand what forms of personally identifiable information can cause the most harm. Furthermore, we discuss the legality and ethics behind working with personally identifiable information to illustrate the importance of protecting privacy. We then offer a taxonomy of threats, vulnerabilities, and impacts and describe how these determine risk. Based on this taxonomy, we develop a framework to assess risk on the Real Data Corpus, a collection of forensic disk images containing personally identifiable information. In addition, we analyzed-identification methods such as pseudonymization and anonymization, and consider re-identification risks. Finally, we apply our framework and methodology to a real-world scenario to determine the risk of data disclosure to an extramural researcher.

Descriptive Note:

Technical Report



Communities Of Interest:

Modernization Areas:

Distribution Statement:

Approved For Public Release;

File Size: