Classifier Evaluation With Proper Scoring Rules

Diana Grygorian
Classification is a fundamental task in machine learning, which involves
predicting the class of a data instance based on a set of features. Performance of a classifier can be measured using a loss function, which assigns a loss value for each classification error.
Classification error happens when the predicted and the actual class differ. In
the simplest case, all combinations resulting in a classification error are considered equal in terms of cost. However, some problems demand different types of misclassification to be of different importance, which forms a cost context. Depending on the properties of the cost contexts, different loss functions can be applied. For example, if the arithmetic mean of costs for one false positive and one false negative is fixed and these costs are uniformly distributed, then Brier score is the suitable loss function. If their harmonic mean is fixed, then log loss should be used instead. These two functions belong to a larger family of loss functions known as proper scoring rules. Scoring rules are loss functions which deal specifically with probabilistic classification, where the classifier is required to predict probability for each class, indicating prediction confidence.
In this thesis, a new cost context for binary classification is presented, where
both costs have their own uniform distributions. A corresponding new loss function for this cost context is proposed, named Inverse Score, and is subsequently proven to be a proper scoring rule. The experiments confirm that the total cost when using said cost context and expected loss when using the new loss function are the same.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Meelis Kull, PhD
Defence year