## Dataset Shift and the Adjustment of Probabilistic Classifiers

Name

Theodore Heiser

Abstract

Classification is the machine learning problem of assigning a class to

a given instance of data defined by a set of features. Probabilistic classification

is the stricter problem of assigning probabilities to each possible class given an

instance, indicating the classifiers confidence in that class being correct for the

given instance.

The underlying assumption of classical machine learning is that any instance used to train or test the classifier is sampled independently and identically distributed from the same joint probability distribution of features and labels. This, however, is a very unlikely situation in real world applications, as the distribution of data frequently changes over time. The change in the distribution of data between the time of training the classifier and a future point in the classifierâ€™s life cycle (testing, deployment, etc.) is known as dataset shift.

In this thesis, a novel procedure is presented which improves the performance

of a probabilistic classifier experiencing any pattern of shift that causes the class distribution to change, a property most patterns of shift share.

This new technique is based off of adjustment, the process of matching the probabilistic classifierâ€™s expected output to the class distribution of the data. In previous works it has been shown that adjustment can be used to reduce expected loss for mean squared error and KL divergence. These two loss functions are a part of a wider family of loss functions called proper scoring rules.

The proposed novel procedure is termed general adjustment, since it reduces

expected loss for all proper scoring rules. It comes in two varieties, unbounded

and bounded. Unbounded general adjustment gives results equivalent to the previously described adjustment procedures for mean squared error and KL divergence.

Bounded general adjustment is a further refinement, reducing expected loss as

much or more than its unbounded form. Both are convex minimization tasks, and

therefore computationally efficient to compute.

The results of a series of experiments show that bounded general adjustment

reduces loss in a practical setting, where the exact value of the new class distribution may not be known. Even with moderate error in the estimation of the new class distribution, bounded general adjustment still reduces loss in most cases.

a given instance of data defined by a set of features. Probabilistic classification

is the stricter problem of assigning probabilities to each possible class given an

instance, indicating the classifiers confidence in that class being correct for the

given instance.

The underlying assumption of classical machine learning is that any instance used to train or test the classifier is sampled independently and identically distributed from the same joint probability distribution of features and labels. This, however, is a very unlikely situation in real world applications, as the distribution of data frequently changes over time. The change in the distribution of data between the time of training the classifier and a future point in the classifierâ€™s life cycle (testing, deployment, etc.) is known as dataset shift.

In this thesis, a novel procedure is presented which improves the performance

of a probabilistic classifier experiencing any pattern of shift that causes the class distribution to change, a property most patterns of shift share.

This new technique is based off of adjustment, the process of matching the probabilistic classifierâ€™s expected output to the class distribution of the data. In previous works it has been shown that adjustment can be used to reduce expected loss for mean squared error and KL divergence. These two loss functions are a part of a wider family of loss functions called proper scoring rules.

The proposed novel procedure is termed general adjustment, since it reduces

expected loss for all proper scoring rules. It comes in two varieties, unbounded

and bounded. Unbounded general adjustment gives results equivalent to the previously described adjustment procedures for mean squared error and KL divergence.

Bounded general adjustment is a further refinement, reducing expected loss as

much or more than its unbounded form. Both are convex minimization tasks, and

therefore computationally efficient to compute.

The results of a series of experiments show that bounded general adjustment

reduces loss in a practical setting, where the exact value of the new class distribution may not be known. Even with moderate error in the estimation of the new class distribution, bounded general adjustment still reduces loss in most cases.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Meelis Kull

Defence year

2018