Classification of E-Commerce Products Based on Textual Product Descriptions

Name
Karmen Kink
Abstract
Assigning Harmonized System (HS) codes to products is necessary to comply with customs regulations, gather statistics, and prevent fraud. Since the HS is a complex system with many classes, automatic HS code classification is required to speed up the process and ensure correctness. In this thesis, we explore two types of machine learning methods for HS classification – shallow neural network classifiers and deep neural network classifiers that are based on the Transformer architecture. We find that with a large dataset, shallow classifiers can be relatively easily improved to outperform Transformer-based classifiers, while tuning the latter is a more complex and time-consuming task. We also discover that the training data includes erroneously labeled entries and that this has a negative impact on the models.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Kairit Sirts, Karl-Oskar Masing
Defence year
2021
 
PDF