Towards automating data quality specification by extracting data quality requirements from data features

Organization
Software Engineering and Information Systems
Abstract
In order to preserve the value of the data in their subsequent use, the prerequisite of data quality must be met. To be able to verify the quality of data, especially third-party data (data produced / collected by a source that is different from the data user), the quality of data should be verified, which is time- and efforts- consuming task. Moreover, it requires skills and knowledge to carry out even relatively simple data quality checks, that the data user may not have. This could be improved by allowing the user with to at least partly check the quality of the data by automatically determining the appropriate data quality requirements (rules) depending on the nature of data (data values, although parameter names could also be used, if consistent with best practices). Thus, the thesis would review the literature on data quality and the most popular data quality issues typically met in data. This list, supplemented by self-defined quality requirements depending on the nature of the data, will serve as an input to a tool (preferably a web-based, but not mandatory), which would allow a user with no or limited data quality knowledge to verify the quality of a dataset with no (or limited) involvement in defining the data quality requirements for the dataset. It would be beneficial, if the author would be able to apply ML knowledge (to continuously enriching the database of the requirements improve their assignment to the data).
Graduation Theses defence year
2024-2025
Supervisor
Anastasija Nikiforova
Spoken language (s)
English
Requirements for candidates
Level
Masters
Keywords
#SEIS, #dataquality, #dataqualitymanagement, #metadata, #machinelearning, #ai, #artificialintelligence

Application of contact

 
Name
Anastasija Nikiforova
Phone
E-mail
anastasija.nikiforova@ut.ee