Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Developing a Scikit-Learn Module for a Novel Data Partition for Machine Learning

Name

Rain Vagel

Abstract

Machine learning is the field of using data and statistical models to make predictions. With the help of data partitioning schemes, researchers are able to efficiently test and report accuracies or error values of their models with li- mited data. Depending on the partitioning scheme, other helpful results, such as hyper-parameters of the model, can be returned. A new data partitioning scheme, cross-validation & cross-testing, has been discovered. However it is not yet widely used due to the fact that currently no open-source machine learning library has a function for it. In this thesis we will publish scikit-learn compatible function on Github and also implement it on different tasks. This new function can be used by anybody under an open-source license. Our tests showed that this new partitio- ning scheme might perform slightly worse on regression tasks, than was previously thought. For this we must study cross-validation & cross-testing further, to better understand and to further facilitate its use.

Graduation Thesis language

English

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Raul Vicente Zafra, Kristjan Korjus

Defence year

2017

PDF

UT Institute of Computer Science Graduation Theses Registry

Developing a Scikit-Learn Module for a Novel Data Partition for Machine Learning