Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Generating Lexical Relations with Large Pre-trained Language Models

Name

Eduard Rudi

Abstract

With the rise of OpenAI’s ChatGPT, large language models have become extremely popular. ChatGPT is powered by a generative pre-trained transformer (GPT), with the latest version being GPT-4. This thesis aims to test the capabilities of GPT-4, which is currently considered state-of-the-art. The approach is to generate a Wordnet, a collection of words and their relationships. This is an effective method of testing GPT-4 as it allows for testing in multiple languages, including resource-rich languages like English and resource-poor languages like Estonian. Previous attempts at generating Wordnet relied heavily on machine translation, which is typically ineffective for resource-poor languages. Unfortunately, GPT-4 did not perform as well as expected, struggling to generate all meanings and over-generating relationships in both languages. Ultimately, generative LLMs work best when context already exists, such as in summarization or generating unit tests.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Mark Fišel, Heili Orav

Defence year

2024

PDF

UT Institute of Computer Science Graduation Theses Registry

Generating Lexical Relations with Large Pre-trained Language Models