Meta-Learning Based Approach for Automated Pre-processing for Clustering

Name
Hasan Mohammed Tanvir
Abstract
Data pre-processing is an integral part of any data analysis project. There are wide range of Data pre-processing methods, such as replacing missing values, scaling, and data reduction. The aim of this project is to automate data pre-processing by leveraging Automated Machine Learning (AutoML). While supervised learning has been in core focus of AutoML research, unsupervised learning remained comparatively less unexplored. Therefore, the thesis focuses on suggesting a data pre-processing pipeline for an unsupervised clustering task by exploiting meta-learning space and meta learners in a domain-agnostic manner for users who does not have in depth knowledge of the machine learning algorithms. The thesis explores the potential of integrating data preprocessing approach to a meta-learning-Based framework for automated algorithm selection and hyperparameter tuning for clustering, named CsmartML built on scikit-learn with 8 clustering algorithm. The proposed methodology applies meta-learning and creates a knowledge space on each of the 112 benchmark datasets. We show that the performance of cSmartML when integrating the automated preprocessing component is often much better than the original clustering result. The comparison with cSmartML showed that the proposed data preprocessing improved the clustering result 0.3% to 27% in 7 out of 10 real datasets and 4% to 44% in 3 out of 6 artificial datasets. In addition, experimentation reveals that the proposed approach takes advantage of the defined objective functions on multi-objective functions framework. This shows that data preprocessing for unsupervised clustering task is as important as supervised learning. Additionally based on the meta-learning space, the project also proposes user a pipeline of data preprocessing, algorithm selection including hyperparameter tuning for further clustering.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Dr. Radwa Elshawi
Defence year
2022
 
PDF