## Measuring Monotonicity in Multiclass Classification

Name

Joonas Kriisk

Abstract

Machine learning is a field in computer science where the main purpose is to create a method which can make predictions about data. Multiclass classification is a task in machine learning where the solution is to classify an object into one of at least three classes based on real-life observations. After training the model, it is usually necessary to somehow evaluate the accuracy of the model. This is usually done by splitting data into two sets - training and test set - and testing the trained model against the test set. Models output scores, which show the confidence of the prediction. In the process of decision making, it is very helpful if these scores could be interpreted as class probabilities. Calibration is used to convert these scores to probabilities. Having prediction errors can have negative consequences in certain applications, so it is necessary for models to be well calibrated. Most popular and widely used binary calibration method is isotonic regression, it fits a free-form line to scores, but there is a constraint - the line must be non-decreasing everywhere. For multiclass cases, a reduction to binary task is mostly done to use isotonic regression but this once again means that scores need to be monotonous. Consequently it is logical to check if this also holds true in multiclass problems as this can help in developing multiclass calibration methods.

This Bachelor's thesis focuses on measuring the monotonicity by creating multiple machine learning models on different datasets. But to measure monotonicity, we first had to come up with ways on how to measure it - for this thesis we proposed two different methods. First method ranks all probabilities base on a single class probabilities resulting in a one-vs-rest comparison. Second method takes top 50% probabilities of two classes (to reduce the noise made by third class), ranks them and this results in a one-vs-one comparison. Main result of the experiment is that 71,4% of dataset-model pairs seem to be monotonic while the monotonicity of non-monotonic pairs seems to be highly affected by having low accuracy on all models on these datasets.

An empirical study was conducted on 21 datasets where on each set 7 models were trained. Two different measurement methods were used to calculate monotonicity and based on results these two methods have similar outcomes and seem to be affected by the accuracy of the model. Monotonicity in multiclass datasets has not been researched before and this research provides insight on if multiclass sets are monotonic or not.

Many models nowadays are used widely in marketing, banking and medicine and having calibrated models is a necessity. Knowing that multiclass datasets are monotonic now gives the opportunity to devise a more efficient calibration method on multiclass problems.

This Bachelor's thesis focuses on measuring the monotonicity by creating multiple machine learning models on different datasets. But to measure monotonicity, we first had to come up with ways on how to measure it - for this thesis we proposed two different methods. First method ranks all probabilities base on a single class probabilities resulting in a one-vs-rest comparison. Second method takes top 50% probabilities of two classes (to reduce the noise made by third class), ranks them and this results in a one-vs-one comparison. Main result of the experiment is that 71,4% of dataset-model pairs seem to be monotonic while the monotonicity of non-monotonic pairs seems to be highly affected by having low accuracy on all models on these datasets.

An empirical study was conducted on 21 datasets where on each set 7 models were trained. Two different measurement methods were used to calculate monotonicity and based on results these two methods have similar outcomes and seem to be affected by the accuracy of the model. Monotonicity in multiclass datasets has not been researched before and this research provides insight on if multiclass sets are monotonic or not.

Many models nowadays are used widely in marketing, banking and medicine and having calibrated models is a necessity. Knowing that multiclass datasets are monotonic now gives the opportunity to devise a more efficient calibration method on multiclass problems.

Graduation Thesis language

English

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Mari-Liis Allikivi, Meelis Kull

Defence year

2018