Generating Lexical Relations with Large Pre-trained Language Models

Name
Eduard Rudi
Abstract
With the rise of OpenAI’s ChatGPT, large language models have become extremely popular. ChatGPT is powered by a generative pre-trained transformer (GPT), with the latest version being GPT-4. This thesis aims to test the capabilities of GPT-4, which is currently considered state-of-the-art. The approach is to generate a Wordnet, a collection of words and their relationships. This is an effective method of testing GPT-4 as it allows for testing in multiple languages, including resource-rich languages like English and resource-poor languages like Estonian. Previous attempts at generating Wordnet relied heavily on machine translation, which is typically ineffective for resource-poor languages. Unfortunately, GPT-4 did not perform as well as expected, struggling to generate all meanings and over-generating relationships in both languages. Ultimately, generative LLMs work best when context already exists, such as in summarization or generating unit tests.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Mark Fišel, Heili Orav
Defence year
2024
 
PDF