Towards a Semantically Rich Meme Dataset

Name
Tarmo Pungas
Abstract
This thesis presents the idea of a system capable of recommending internet memes based on textual context. The goal of the thesis is to help realize this system by preparing a semantically rich dataset. Methods for exploring, cleaning, and analyzing data are presented. Different metrics for assessing inter-annotator agreement are discussed. A pipeline is created to enable the preprocessing and enrichment of an existing meme dataset. As part of the preprocessing, the data are cleaned, filtered, and integrated with other datasets. For semantic enrichment, crowdsourcing is used to collect information for 50 memes. The resulting semantic annotations are analyzed and evaluated, revealing that the collected data are of mixed quality and that further data collection is necessary. Finally, future improvements to the pipeline are suggested.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Riccardo Tommasini, Radwa El Shawi
Defence year
2022
 
PDF