Analyzing the Quality of Generalization Based Data Anonymization

Joosep Tavits
When data holders release personal data, it is required that the privacy of data subjects has to be preserved. A widespread method for counteracting the accompanying extensive responsibilities is data anonymization, which largely removes the connection between data records and individuals. In addition to several different anonymization algorithms created throughout the past two decades, a variety of metrics for estimating the quality of the performed anonymization have been published. Therefore choosing the best metrics for specific types of anonymization without extensive knowledge of the field is not a trivial problem. This bachelor's thesis researches data anonymization and ways to measure its quality based on a typical organization that often releases personal data. The focus of this thesis is to identify an optimal subset of anonymization quality metrics based on the description of the end user and the anonymization methods used. Finally the subset of metrics is implemented as a software component, which is then integrated into existing anonymization software as an output validation component in order to describe the remaining risks and changes in the dataset post-anonymization.
Graduation Thesis language
Graduation Thesis type
Bachelor - Computer Science
Sulev Reisberg
Defence year