Exploring the Human-like Ability of LLMs in Recognizing Self-generated Text

Name
Katariina Ingerma
Abstract
Large Language model (LLM) is a type of generative artificial intelligence model, that can generate human-like texts. The popularity of LLMs is rapidly increasing every day due to their ability to understand and generate texts that closely resemble human language. This textual content generation ability of LLMs continues to expand their acceptability in professional tasks such as advertising slogan creation, news composition, story generation, etc. The proliferation of LLMs in diverse areas expedites the malicious use of LLMs, which can be a serious threat to information ecosystems and public trust. Therefore, there is an imperative need to develop effective methods to distinguish between LLM-generated and human-written textual content. In this thesis, we have studied the
linguistic differences between human-written and LLM-generated texts, the machinegenerated text detection performance of an LLM that generates the textual content, and the effect of the textual length on the machine-generated text detection performance. The results reveal that LLMs having fewer parameters generate texts with higher Type-Token-Ratio compared to human-authored texts, while more advanced LLMs exhibit similarities to human writing. According to the obtained results, the more advanced and larger the LLM, the chances are less that it can detect its own generated texts due to their close resemblance to human-authored content. This research is conducive to addressing the problems that arise from the texts produced by LLMs, such as misinformation. It contributes to the development of new approaches for identifying the LLM-generated content.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Somnath Banerjee
Defence year
2024
 
PDF