Building a Classification Model for Harmonized System Code Prediction from Product Images

Name
Simo Jaanus
Abstract
The world's main systems for automatically classifying Harmonized System (HS) codes for e-commerce products are currently based on textual product descriptions. Product descriptions, on the other hand, are frequently erroneous and lack all of the necessary information to accurately predict the HS code. This has prompted us to look into a other type of data - images. The main goal of this Master's thesis is to discover the most effective method for predicting HS codes from images. Classifiers based on various state-of-the-art neural network architectures such as transformers and next generation of convolution neural networks are examined. In this work, we conduct multiple experiments, that show convolutional neural networks are superior in terms of accuracy and inference time, when compared to transformers. We also established a human baseline to aid with the interpretation of the findings. During the data and error analysis, we discovered that the training data had an outstanding amount of incorrect labels, which might have influenced the results.
Graduation Thesis language
English
Graduation Thesis type
Master - Software Engineering
Supervisor(s)
Dmytro Fishman, Dmitri Smirnov
Defence year
2022
 
PDF