Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Fact Extraction from Medical Text using Neural Networks

Name

Nesma Talaat Abbas Mahmoud

Abstract

Fact extraction from free text is a challenging task requiring a great deal of human effortto program regular expressions and build rule-based solutions. It is essential in themedical field where many care details are only stored as free text and automated factextraction is the only way to interpret the large scale medical databases. Such medicaltexts represent communication between doctors and the text is often not syntacticallyvalid, concepts are not represented consistently and the text is rife with misspellings.The described problems make it challenging to develop rule-based solutions to handleall the potential ways a fact might be written down. In this thesis, The effectiveness ofneural networks was explored to do the fact extraction on texts from discharge reportson the Estonian Health Information System. We used the whole dataset of medicaltexts to train word embedding models. On the subsets of the data with annotations ofparticular facts, different classification models were tested to detect those. We found thatemploying pre-trained word embeddings allowed us to efficiently learn new models forfact extraction using relatively small amounts of annotated data. We managed to achievean F1 score of 0.86% for a new tag using 732 samples as the training dataset, validate on82 samples, and testing over 3258 samples.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Raivo Kolde

Defence year

2020

PDF

UT Institute of Computer Science Graduation Theses Registry

Fact Extraction from Medical Text using Neural Networks