Issue Report Resolution Time Prediction
Name
Myroslava Stavnycha
Abstract
Abstract:
Prediction of the resolution time of an issue report has always been an important, but difficult, task. The primary purpose of this study is to build a model that predicts the resolution time of incoming issue reports based on past issue report data. Moreover, additional goals of the research are to determine which existing approaches of resolution time prediction yield the highest levels of accuracy, and which features of issue reports are essential for prediction. The approach chosen for building an issue resolution time prediction model was to improve currently existing models applying additional reports pre-processing. The project was designed to analyse, combine, compare and improve different techniques of resolution time prediction. This includes k-means clustering, k-nearest neighbor classification, Naïve Bayes classification, decision trees, random forest and others, in order to achieve the best results with regards to prediction accuracy. For conducting the current research, data was collected from a repository of the Estonian company Fortumo OÜ. The data provided by Fortumo contained actual resolution times of 2125 issues from 25 Apr 2011 till 1 Jan 2015 along with initial time estimates made by Fortumo employees.
The data from the repository indicates that around 50% of the time estimates made by Fortumo employees fall into the range of ±10% of the actual resolution time. In addition, 67% of experts’ estimates have absolute error ≤ 0.5 hour. Existing proposed approaches don’t increase the predictive quality. On the contrary, proposed methods bring worse results. Random Forest and Ordered Logistic Regression, as the best among the proposed models, still produced a prediction quality 12-20% worse than the estimates of the experts. After improvement of the best performing approaches, meta-information-based models yielded a better accuracy than proposed models by up to 5%. However, text-based models produced a higher prediction quality, approximately up to 20% better than estimates made by experts.
Keywords:
Machine learning, data mining, prediction, k-means, k-nearest neighbours, random forest, ordered logistic regression, Naïve Bayes classifier, latent semantic analysis, issue report, resolution time
Graduation Thesis language
English
Graduation Thesis type
Master - Software Engineering
Supervisor(s)
Dietmar Pfahl
Defence year
2015