Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Evaluating Machine Learning Models on Data with Few Labels

Name

Mart-Mihkel Aun

Abstract

Machine learning models used to solve classification tasks are evaluated using quality measures such as accuracy, precision, and recall. These measures or their estimates are calculated through the class labels of data points and the classifications of the method on those data points. To find the actual class labels, they must be manually reviewed. Often, quality measures are evaluated using a finite sample, and the obtained estimates obtained errors.
In this thesis, the necessary sample size was derived, to not exceed the limit of estimation error with a certain confidence level. In addition, for a sample, the definition-based way of finding the accuracy, precision, or recall of all the sample data points' labels must be determined. If another method exists in addition to the method being evaluated, it can be used for a new evaluation.
In this case, it is possible to reduce the amount of manual work required for labeling by examining how much better the new method is than the old one instead of calculating the quality measures of the new method.
This thesis explored techniques that help to reduce the number of data points that require labeling for the evaluation of the quality measures of the two classification methods.

Graduation Thesis language

Estonian

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Sven Laur

Defence year

2023

PDF Extras

UT Institute of Computer Science Graduation Theses Registry

Evaluating Machine Learning Models on Data with Few Labels