Toward an Automated Data Quality Rule Detection in Data Warehouses

Name
Heidi Carolina Martinsaari
Abstract
Data is a valuable asset from which information and knowledge are derived. However, business success is not depending on the amount of data only, but also on the quality of these data. On the other hand, data quality management requires a good system and the cooperation of several parties which is time-consuming and costly. Thus, it is considered if using artificial intelligence in ensuring data quality would help to avoid human errors, complement human actions, and reduce personnel costs and the workload of data quality specialists.
The objective of this thesis is to explore the current landscape of data quality solutions to find out whether these are able to automatically detect data quality rules using machine learning methods, specialising in data warehouses. For this, a systematic review of data quality software available in the market and provided in academic publications was conducted.
It was found that most of the data quality tools are used for data cleansing and fixing, meant for domain-specific databases instead of data warehouses. Meanwhile, only a few tools were capable of detecting data quality rules, not to mention implementing this in data warehouses.
Whereas the subject of automated data quality rule detection is insufficiently covered in the academic landscape and poorly represented in the market, this thesis makes a call for action in this area.
Graduation Thesis language
English
Graduation Thesis type
Master - Data Science
Supervisor(s)
Anastasija Nikiforova
Defence year
2023
 
PDF