Dataset Shift and the Adjustment of Probabilistic Classifiers

Theodore Heiser
Classification is the machine learning problem of assigning a class to
a given instance of data defined by a set of features. Probabilistic classification
is the stricter problem of assigning probabilities to each possible class given an
instance, indicating the classifiers confidence in that class being correct for the
given instance.
The underlying assumption of classical machine learning is that any instance used to train or test the classifier is sampled independently and identically distributed from the same joint probability distribution of features and labels. This, however, is a very unlikely situation in real world applications, as the distribution of data frequently changes over time. The change in the distribution of data between the time of training the classifier and a future point in the classifier’s life cycle (testing, deployment, etc.) is known as dataset shift.
In this thesis, a novel procedure is presented which improves the performance
of a probabilistic classifier experiencing any pattern of shift that causes the class distribution to change, a property most patterns of shift share.
This new technique is based off of adjustment, the process of matching the probabilistic classifier’s expected output to the class distribution of the data. In previous works it has been shown that adjustment can be used to reduce expected loss for mean squared error and KL divergence. These two loss functions are a part of a wider family of loss functions called proper scoring rules.
The proposed novel procedure is termed general adjustment, since it reduces
expected loss for all proper scoring rules. It comes in two varieties, unbounded
and bounded. Unbounded general adjustment gives results equivalent to the previously described adjustment procedures for mean squared error and KL divergence.
Bounded general adjustment is a further refinement, reducing expected loss as
much or more than its unbounded form. Both are convex minimization tasks, and
therefore computationally efficient to compute.
The results of a series of experiments show that bounded general adjustment
reduces loss in a practical setting, where the exact value of the new class distribution may not be known. Even with moderate error in the estimation of the new class distribution, bounded general adjustment still reduces loss in most cases.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Meelis Kull
Defence year