Overconfidence and Temperature Scaling among Probabilistic Classifiers

Name
Joonas Järve
Abstract
This master's thesis we work with discriminative classifiers such as linear discriminant analysis, logistic regression, and artificial neural networks. Thanks to the high capability of the latter, they have become truly popular in recent years. Despite the popularity and high performance, it has become clear that large neural networks are not well-calibrated and tend to be overconfident (Guo et al., 2017). Overconfidence means that when the model does not know the correct output for sure, it still attributes a high probability to the decision. Therefore, the thesis concentrates on the overconfidence of the probabilistic classifiers and its remedy -- temperature scaling calibration technique. The latter has shown to be very useful in practice, but its goodness has not yet been clearly explained.

In this master's thesis, we first present an overview of the temperature scaling calibration method and the model's overconfidence and underconfidence. Secondly, we show that the model's overconfidence is not only an issue for deep neural networks but is also apparent in simple models such as linear discriminant analysis and logistic regression. Furthermore, we demonstrate that overconfidence is recognisable in the difference between the model's train and validation score distributions. Finally, we prove under which assumptions the temperature scaling method calibrates the model and explain why it has proven so useful in practice.
Graduation Thesis language
Estonian
Graduation Thesis type
Master - Data Science
Supervisor(s)
Meelis Kull
Defence year
2022
 
PDF