Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Google Dataflow Orchestration Using TOSCA in the Hybrid Cloud

Name

Manish Gupta

Abstract

In today’s world, data is as precious as oil. Many organizations depend on data to make critical business decisions, target specific customers, and accelerate their business growth. This importance of data leads to increased data creation and consumption volume. To process and provide logistics for this tremendous data, one requires a practical and automated approach to data handling. Data Pipeline is a series of interconnected modular tasks that collect, process and make data available to a wide array of systems with minimal manual intervention. There are numerous vendors and open-source platforms that support building Data Pipelines for an organization. However, developers need to have platformspecific knowledge to manage and orchestrate different data pipeline platforms. The lack of standardization for orchestrating data pipelines leads to increased development time and reduced reusability. TOSCA is an open standard used to define topology and orchestration specifications for different cloud services. In this paper, reusable TOSCA components were created in the RADON ecosystem to deploy, terminate, and manage Google Dataflow jobs. RADON is a research project that aims to develop a model-driven DevOps framework for serverless computing. The TOSCA components for Google Dataflow were designed to integrate with existing TOSCA components for Apache Nifi based data pipeline. The integration provides a one-stop solution for developers to build extensive data pipelines combining Google Dataflow and Apache Nifi.

Graduation Thesis language

English

Graduation Thesis type

Master - Software Engineering

Supervisor(s)

Chinmaya Dehury, Pelle Jaokovits

Defence year

2022

PDF

UT Institute of Computer Science Graduation Theses Registry

Google Dataflow Orchestration Using TOSCA in the Hybrid Cloud