University Dropout Prediction Using Machine Learning Models

Kertu-Carina Kallaste
High dropout rates are a relevant problem in higher education institutions all over the globe. As this issue negatively affects both students, universities, and society as a whole, predicting dropout risk has become a popular research field. The aim of this bachelor's thesis is to create a machine learning model for predicting the dropout risk of bachelor, applied higher education and integrated study program students in University of Tartu, using the educational analytical data collected by the study information system of University of Tartu from 2011 to 2022.
We have implemented and evaluated several machine learning models, each utilizing distinct algorithmic approaches, to facilitate a comprehensive comparative analysis. More specifically, the best prediction results were achieved with a prediction model based on the random forest algorithm, which was able to identify 88% of dropouts on the test set. The model's ROC AUC score was 0.94, indicating very high ability to distinguish between classes. Although these results are promising, it is important to verify the model's performance once newer data becomes available to ensure its generalisation.
As a result of this practical work, a set of risk models with improved predictive power over the currently deployed model was created. An outcome of this research has the potential to be integrated into the university's educational analytics dashboard in the future. This would allow program managers to proactively intervene in risk situations.
Graduation Thesis language
Graduation Thesis type
Bachelor - Computer Science
Elena Sügis, Leo Siiman
Defence year