Towards automating data quality specification by extracting data quality requirements from data features

Organisatsiooni nimi
Software Engineering and Information Systems
Kokkuvõte
In order to preserve the value of the data in their subsequent use, the prerequisite of data quality must be met. To be able to verify the quality of data, especially third-party data (data produced / collected by a source that is different from the data user), the quality of data should be verified, which is time- and efforts- consuming task. Moreover, it requires skills and knowledge to carry out even relatively simple data quality checks, that the data user may not have. This could be improved by allowing the user with to at least partly check the quality of the data by automatically determining the appropriate data quality requirements (rules) depending on the nature of data (data values, although parameter names could also be used, if consistent with best practices). Thus, the thesis would review the literature on data quality and the most popular data quality issues typically met in data. This list, supplemented by self-defined quality requirements depending on the nature of the data, will serve as an input to a tool (preferably a web-based, but not mandatory), which would allow a user with no or limited data quality knowledge to verify the quality of a dataset with no (or limited) involvement in defining the data quality requirements for the dataset. It would be beneficial, if the author would be able to apply ML knowledge (to continuously enriching the database of the requirements improve their assignment to the data).
Lõputöö kaitsmise aasta
2024-2025
Juhendaja
Anastasija Nikiforova
Suhtlemiskeel(ed)
inglise keel
Nõuded kandideerijale
Tase
Magister
Märksõnad
#SEIS, #dataquality, #dataqualitymanagement, #metadata, #machinelearning, #ai, #artificialintelligence

Kandideerimise kontakt

 
Nimi
Anastasija Nikiforova
Tel
E-mail
anastasija.nikiforova@ut.ee