Multivocal Literature Review on Data Quality Challenges in Data Pipelines

Rain Hallikas
This thesis presents a multivocal literature review focusing on data quality challenges within data pipelines. Data quality is intently affected by data pipeline processes, and this thesis aims to provide a nuanced understanding of most popular aspects to influence data quality within data pipelines. The multivocal nature of the thesis introduces grey literature into the research to have more precise conclusions on the topic, as data pipelines have only surged in popularity in recent years. Additionally, the thesis offers an overview of current solutions and open issues with data quality to advance data pipeline engineering. The challenges together with solutions represent a guide to understanding data quality challenges within pipelines more deeply and offer insight for future work.
Graduation Thesis language
Graduation Thesis type
Master - Software Engineering
Dietmar Alfred Paul Kurt Pfahl, Mario Ezequiel Scott
Defence year
PDF Extras