Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Hard and Soft Tuning of Spark Ecosystem Toward Query Energy Efficiency

Name

Tofig Bakhshiyev

Abstract

This thesis explores the energy efficiency of executing TPCH queries within the Apache Spark framework, explicitly focusing on diverse file formats (Parquet, CSV, Avro, and TBL) and varying partition sizes in a standalone configuration. The assessment measures energy consumption during the data reading and query processing phases. Initial comparisons are made regarding the characteristics of Parquet, CSV, and Avro formats, analysing their impact on the query performance of Spark. Additionally, the study investigates Spark’s standalone configuration, scrutinising cluster settings, resource allocation, and hardware optimizations that influence energy usage during query execution. An integral part of this exploration involves comprehending how different partition sizes influence energy consumption. The evaluation systematically assesses the impact of partition sizes on IO operations, data shuffling, and overall energy consumption during query processing. Utilising TPCH queries as benchmarks, experiments are conducted across various file formats, partition sizes, and configurations. The outcomes offer practical insights for enhancing energy efficiency in Spark-based big data processing. This research contributes to the broader discourse on sustainable data processing, guiding practitioners to make energy-conscious decisions in Apache Spark environments.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Simon Pierre Dembele

Defence year

2024

PDF

UT Institute of Computer Science Graduation Theses Registry

Hard and Soft Tuning of Spark Ecosystem Toward Query Energy Efficiency