Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Fine-tuning GPT-3.5 for Medical Named Entity Recognition

Name

Anna Maria Tammin

Abstract

The aim of this thesis was to explore how well can GPT-3.5 Turbo label named entities. Patient health data contains a lot of useful information in free text form. In order to use this for statistical analyses, structured information has to be extracted from them, for example by annotating named entities. Machine learning based approaches require a lot of annotated data for this, however, a large language model such as GPT-3.5 Turbo has been shown to adapt to different tasks on only a few examples. This general understanding can be leveraged to label named entities. In this thesis, models were fine-tuned with different amounts of data to see how it would benefit labelling. Results showed that fine-tuning does enhance the model's proficiency in recognising entities in health data. Additionally, it is found that the models fine-tuned on English electronic health records outperform their base counterpart at annotating synthetic Estonian electronic health records.

Graduation Thesis language

Estonian

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Hendrik Šuvalov

Defence year

2024

PDF

UT Institute of Computer Science Graduation Theses Registry

Fine-tuning GPT-3.5 for Medical Named Entity Recognition