Towards Auto-Scaling of Serverless Data Pipelines

Name
Rajan Raj Das
Abstract
Ever-increasing number of IoT devices generates massive data, and collecting data from heterogeneous sources and processing without any bottleneck is challenging. Data pipelines are heavily used for automated data processing without any manual hassle. The traditional Data pipelines, such as Extract-Load-Transform, has its own challenges, which are difficult to scale and reduce the timeliness of data processing. It can be solved with the use of serverless computing. Serverless computing is a recent paradigm in cloud computing, It offers granular level scaling of the functions compared to the Virtual Machine (VM). With the increase of smart and Internet of Things(IoT) devices, the use of data pipeline is increased exponentially. However, stochastic IoT workloads and assuring Quality of Service metrics (Latency, throughput, etc.) impose several challenges, including scaling of the underlying infrastructure. Serverless Data Pipelines(SDP) can be designed to process high data volume with efficient resource usage. SDPs comprise several components like serverless functions, message queues, and queue connectors.
Scaling the entire pipeline without leaving any bottlenecks is challenging. In our study, we created a serverless data pipeline for an Image Processing IoT application that uses serverless functions to execute the data operation tasks. We also applied different reactive scaling mechanisms, such as resource-based scaling and Workload based scaling, to measure the performance of the scalability on the serverless data pipeline. The reactive mechanisms consider single metrics to enforce auto-scaling configuration, i.e. CPU usage or Request rate. Therefore, we evaluated the use of multiple performance metrics of the Serverless data Pipeline to proactively predict the number of serverless functions in the data pipeline. To experiment with this, we collected data by configuring the reactive auto-scalers, cleaning them to remove outliers, and using them for training and testing the proactive auto-scaler. In this work, we used multioutput regression models, and the
results show that the ExtraTreeRegressor algorithm has better efficiency in predicting the pods.
Graduation Thesis language
English
Graduation Thesis type
Master - Software Engineering
Supervisor(s)
Shivananda Rangappa Poojara
Defence year
2023
 
PDF