Cross-Lingual Misinformation Detection: Aligning English and Estonian Fake Health News

Name
Li Merila
Abstract
Health misinformation poses a significant threat as it undermines trust in scientific expertise and reduces compliance with public health measures, ultimately decreasing community resilience against preventable diseases. This thesis focuses on identifying Estonian fake health news by leveraging a pre-labelled dataset in English. The primary objective is to develop a reliable system for generating ground truth labels for fake health news in Estonian, contributing to the field of fake news detection in low-resource settings. The proposed approach, namely Cross-Lingual Alignment and Confident Prediction Sampling (CAPS), employs a hybrid two-phase methodology involving semantic similarity measurements, manual annotation, classification, and confidence sampling to create a novel fake health news dataset in Estonian.
Graduation Thesis language
English
Graduation Thesis type
Master - Data Science
Supervisor(s)
Uku Kangur, Roshni Chakraborty
Defence year
2024
 
PDF