Instance-based Label Smoothing for Better Classifier Calibration
Name
Mohamed Maher Abdelrahman
Abstract
Binary classification is one of the fundamental tasks in machine learning, which involves assigning one of two classes to an instance defined by a set of features. Although accurate predictions are essential in most of the tasks, knowing the model confidence is indispensable in many of them. Many probabilistic classifiers' predictions are not well-calibrated and tend to be overconfident, requiring further calibration as a post-processing step to the model training.
Logistic calibration is one of the most popular calibration methods, that fits a logistic regression model to map the outputs of a classification model into calibrated class probabilities. Various regularization methods could be applied to logistic regression fitting to reduce its overfitting on the training set. Platt scaling is one of these methods, which applies label smoothing to the class labels and transforms them into target probabilities before fitting the model to reduce its overconfidence. Also, label smoothing is widely used in classification neural networks. In previous works, it was shown that label smoothing has a positive calibration and generalization effect on the network predictions. However, it erases information about the similarity structure of the classes by treating all incorrect classes as equally probable, which impairs the distillation performance of the network model.
In this thesis, we aim to find better ways of reducing overconfidence in logistic regression. Here we derive the formula of a Bayesian approach for the optimal predicted probabilities in case of knowing the generative model distribution of the dataset. Later, this formula is approximated by a sampling approach to be applied practically. Additionally, we propose a new instance-based label smoothing method for logistic regression fitting. This method motivated us to present a novel label smoothing approach that enhanced the distillation and calibration performance of neural networks compared with standard label smoothing.
The evaluation experiments confirmed that the approximated formula for the derived optimal predictions is significantly outperforming all other regularization methods on synthetic datasets of known generative model distribution. However, in more realistic scenarios when this distribution is unknown, our proposed instance-based label smoothing had a better performance than Platt scaling in most of the synthetic and real-world datasets in terms of log loss and calibration error. Besides, neural networks trained with instance-based label smoothing, outperformed the standard label smoothing regarding log loss, calibration error, and network distillation.
Logistic calibration is one of the most popular calibration methods, that fits a logistic regression model to map the outputs of a classification model into calibrated class probabilities. Various regularization methods could be applied to logistic regression fitting to reduce its overfitting on the training set. Platt scaling is one of these methods, which applies label smoothing to the class labels and transforms them into target probabilities before fitting the model to reduce its overconfidence. Also, label smoothing is widely used in classification neural networks. In previous works, it was shown that label smoothing has a positive calibration and generalization effect on the network predictions. However, it erases information about the similarity structure of the classes by treating all incorrect classes as equally probable, which impairs the distillation performance of the network model.
In this thesis, we aim to find better ways of reducing overconfidence in logistic regression. Here we derive the formula of a Bayesian approach for the optimal predicted probabilities in case of knowing the generative model distribution of the dataset. Later, this formula is approximated by a sampling approach to be applied practically. Additionally, we propose a new instance-based label smoothing method for logistic regression fitting. This method motivated us to present a novel label smoothing approach that enhanced the distillation and calibration performance of neural networks compared with standard label smoothing.
The evaluation experiments confirmed that the approximated formula for the derived optimal predictions is significantly outperforming all other regularization methods on synthetic datasets of known generative model distribution. However, in more realistic scenarios when this distribution is unknown, our proposed instance-based label smoothing had a better performance than Platt scaling in most of the synthetic and real-world datasets in terms of log loss and calibration error. Besides, neural networks trained with instance-based label smoothing, outperformed the standard label smoothing regarding log loss, calibration error, and network distillation.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Meelis Kull
Defence year
2020