A Comperative Study of Database Schema on Query Performance in Data Warehousing

Name
Roland Pajuleht
Abstract
Decision-making in organizations is often hampered by difficult-to-maintain dashboards and their archaic data architecture. Effort to maintain it over the years is mitigated by the advancement and application of hardware and software which it is built upon. The modern data stack is forgiving, in terms of schema selection and data velocity. This does not mean, however, that fundamental architectural concepts that databases are built upon, should be forgotten.
This paper compares differences in query performance and execution plans between two different approaches to data modelling. Dimensional modelling, a standard procedure for building data warehouses is compared with a less standardized model that starts to emerge as a consequence when concrete data arctitectural procedures are not in place. Several analytical queries are run against a standard, normalized star schema and a table with more relaxed form, often called One Big Table. It was found out that while readability improved when constructing queries for the wide table, performance issues quickly emerged. When operating in traditional data warehouses, data engineeres must adhere established architectural practises in order to maintain an efficient database.
Graduation Thesis language
English
Graduation Thesis type
Master - Data Science
Supervisor(s)
Eduard Ševtšenko
Defence year
2024
 
PDF