Detecting Semantically Equivalent Issue Reports Using Transformer Models

Name
Behrad Moeini
Abstract
Developers support their software development by creating issue reports that can describe bugs, feature requests, or change requests. As the project grows over time, the number of issue reports also grows in number, and some issues are reported multiple times by different users. To avoiding this issue, several automated approaches have been proposed for retrieving duplicate issue reports. These approaches have been mainly based on information-retrieval techniques.
This thesis aims to explore recent advances to detect semantically equivalent text to identify duplicate issue reports. Since several articles are published on this topic, this thesis’s main challenge will be to replicate the existing approaches and compare their performance with the proposed solution. Part of my work is to extract and curate the data from sources such as issue trackers. This thesis will be tackling this as a natural language processing problem and apply advanced techniques to classify whether question pairs are duplicates or not. In this thesis, we take an open-source dataset from GitHub, which many projects have been done on that, so it is easy to compare the result with a different result.
We applied different models build a model to detect whether two questions are semantically the same, beginning with simple models and use more complex models step by step. When we applied our model to the dataset that we have and got each model's result, we take each model their performances and see how are their results.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Dr. Ezequiel Scott
Defence year
2021
 
PDF