Automatic Type Detection of Columns in Table of Health Records

Name
Kristina Mumm
Abstract
The Bachelor’s thesis proposes an automatic method for detecting the type of columns in a data table. Specifically, it focuses on health data where data columns often represent categorical data. The type detection process is divided into three parts. Regular expressions are used to find out which data type structures are appropriate for column values. Then, the values in the column are checked against the list of allowed values for these data types found. In the last step, the best fit of the suitable data types is found. Additionally, the thesis analyses Estonian health data to understand whether errors in health data might cause problems for type detection.
Graduation Thesis language
Estonian
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Sulev Reisberg
Defence year
2022
 
PDF