Predicting Hackathon Outcomes Using Machine Learning (Data Analytics)

Sofiya Demchuk
Over the past two decades, hackathons continue to increase in importance and frequency. Winning hackathon competitions can increase the visibility for winning teams and benefit participants in terms of future job opportunities, personal development and finding potential investors for a project.
Based on an existing dataset that covers around 2000 hackathons and more than 60000 projects over the period of 5 years gathered from Devpost hackathon platform, in this study Data Analysis and Machine Learning techniques were used to identify aspects of hackathon teams that improve their chances of winning. This thesis is an attempt to address the gap in hackathon outcome prediction and to demonstrate the importance of different project features by presenting findings from large scope dataset. Applied techniques outline a framework for approaching the Machine Learning process on a brand-new classification problem addressing the particular difficulties and needs of the desired outcome. Naive Bayes, Logistic Regression and Random Forest were selected because they are widely in use in similar classification tasks, while XGBoost was chosen since in recent years it has given a state-of-the-art performance for different Data Science problems. Besides that, the main focus was made on project feature extraction and feature selection for a better prediction. The developed classifiers are shown to outperform the common-sense rule-based baseline.
Graduation Thesis language
Graduation Thesis type
Master - Software Engineering
Irene-Angelica Chounta, Alexander Nolte
Defence year