Few-Shot Prompt-Tuning of Language Models for App Review Classification: An Evaluation Study

Name
Hashika Dhananjanie Agalakepu Watthegedara Marasinghe
Abstract
App reviews serve as valuable sources of feedback for application developers, offering insights into the needs and preferences of the user. However, the large volume of user reviews received each day makes manual analysis infeasible, requiring automated solutions to detect information in user reviews relevant for developers to improve software quality. Recent strategies for detecting developer-relevant information in app reviews involve fine-tuning pretrained language models (PLMs) for the review classification task using labelled data. Due to the high cost of labelling data and the continuous emergence of new apps and categories in app marketplaces, it is crucial to evaluate recent techniques like pre-train and prompt-tuning, which has demonstrated success in scenarios with limited data. Pre-train and prompt-tuning strategy allows models to adapt to different tasks independently by leveraging domain knowledge introduced through prompts. The main objective of this study is to assess the effectiveness of few-shot prompt-tuning of language models (LMs) for detecting developer relevant information in app reviews. To achieve this objective, the first research question of this study compares the performance of prompt-tuning with traditional fine-tuning of language model RoBERTa under data constrained situation on three labelled review datasets. The second research question explores the impact of prompt-tuning performance for classifying review information on the selection of LMs (T5 and GPT-2) and its architecture. The third and last research question assess the impact of prompt template design and verbalizer design on the performance of prompt-tuning when classifying review information. The findings of this study reveal that the prompt-tuning approach has the potential to outperform traditional fine-tuning strategy in scenarios with limited labelled data availability. Additionally, this study observed variation in model performance across different review datasets, highlighting the importance of model selection, verbalizer design and prompt template design. These insights provide valuable guidance for leveraging prompt-tuning techniques within the app review domain, particularly in contexts characterized by limited availability of labelled data.
Graduation Thesis language
English
Graduation Thesis type
Master - Software Engineering
Supervisor(s)
Faiz Ali Shah
Defence year
2024
 
PDF