Fact Extraction from Medical Text using Neural Networks

Nesma Talaat Abbas Mahmoud
Fact extraction from free text is a challenging task requiring a great deal of human effortto program regular expressions and build rule-based solutions. It is essential in themedical field where many care details are only stored as free text and automated factextraction is the only way to interpret the large scale medical databases. Such medicaltexts represent communication between doctors and the text is often not syntacticallyvalid, concepts are not represented consistently and the text is rife with misspellings.The described problems make it challenging to develop rule-based solutions to handleall the potential ways a fact might be written down. In this thesis, The effectiveness ofneural networks was explored to do the fact extraction on texts from discharge reportson the Estonian Health Information System. We used the whole dataset of medicaltexts to train word embedding models. On the subsets of the data with annotations ofparticular facts, different classification models were tested to detect those. We found thatemploying pre-trained word embeddings allowed us to efficiently learn new models forfact extraction using relatively small amounts of annotated data. We managed to achievean F1 score of 0.86% for a new tag using 732 samples as the training dataset, validate on82 samples, and testing over 3258 samples.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Raivo Kolde
Defence year