Predicting Information Diffusion on Social Media

Kateryna Lytvyniuk
Social media has become a part of the everyday life of modern society. A lot of infor-mation is created and shared with the world continuously. Predicting information has been studied in the past by many researchers since it has its applications in various domains such as viral marketing, news propagation etc.
Some information spreads faster compared to others depending on what interests people. In this thesis, by using supervised machine learning algorithms, we studied information diffusion in a social network and predicted content popularity. Three datasets from Twitter are collected and analysed for building and testing various models based on different ma-chine learning algorithms.
We defined tweet popularity as number of retweets any original message received and stated our research problems as binary and multiclass prediction tasks. We investigated how initial retweeting behaviour of a message affects the predictive power of a model. We also analysed if a recent one-hour retweeting behaviour can help to predict a tweet popu-larity of the following hour. Besides that, main focus is made on finding features im-portant for the prediction.
For binary prediction, the models showed performance of AUC up to 95% and F1 up to 87%. For multiclass prediction, the models were able to predict up to 60% of overall accu-racy and 67% of F1, with more accurate performance of classes with messages with very low and high retweet counts comparing to others. We created our models using one da-taset and tested our approach on the other two datasets, which showed that the models are robust enough to deal with multiple topics.
Graduation Thesis language
Graduation Thesis type
Master - Software Engineering
Rajesh Sharma, Anna Jurek
Defence year