Privacy-preserving Data Synthesis Using Trusted Execution Environments

Karl Hannes Veskus
Data synthesis is the process of generating new synthetic data from existing data. Often companies do not have the the in-house competence to synthesize data themselves, and are willing to outsource the process. However, synthesis requires access to the original data. Sharing data with a third party can be complex, especially so if it contains sensitive information or is considered as personal data by regulations such as the GDPR. The goal of this thesis is to develop a proof-of-concept privacy-preserving data synthesis service showing that it is possible to use trusted execution environments to perform data synthesis in a privacy-preserving manner. Such a service would enable outsourcing the data synthesis process to an untrusted remote server by ensuring that both the original and synthesized data are fully hidden from the untrusted server host throughout the lifecycle of the service.
A prototype of the service was developed in the scope of an ongoing proof-of-concept project. To achieve the required security goals the service prototype uses trusted execution environment technologies, specifically the Sharemind HI development platform, which is in turn based on the Intel SGX platform. The developed service shows that synthesizing data in a privacy-preserving manner is indeed feasible if trusted execution environments are used. However, future work is needed to optimize the service to allow larger input and output files, and to support additional data synthesis methods.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Liina Kamm, Sven Laur
Defence year