Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Estonian Synthetic Error Generation by Prompting for Grammatical Error Correction

Name

Martin Vainikko

Abstract

For Estonian grammatical error correction (GEC), sufficient data to train end-
to-end models is lacking. However, the recent advancements in large language models (LLMs) offer new opportunities. We utilize OpenAI’s GPT models (GPT-3.5-Turbo, GPT-4-Turbo, and GPT-4) to generate synthetic errors and analyze these errors across different model versions, prompting strategies, and data domains. By fine-tuning models on these synthetic datasets and conducting human evaluations, we assess the effectiveness of various prompting strategies for synthetic error generation. Our findings indicate that within the GEC domain, the errors generated by GPT models are comparable to those made by humans. Human evaluations also revealed that GPT models produce problematic edits. This highlights significant potential for further research in this area.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Agnes Luhtaru, Mark Fišel

Defence year

2024

PDF

UT Institute of Computer Science Graduation Theses Registry

Estonian Synthetic Error Generation by Prompting for Grammatical Error Correction