Framework for Neural Network Based Fact Extraction Workflows

Name
Hendrik Šuvalov
Abstract
Medical texts, such as diagnoses and epicrises, are due to their nature often unstructured, sometimes in the form of free text. The common practice for extracting useful information from them, such as named entities (e.g. drug or disease names) and their semantic relations, is by using rule or pattern-based extraction, namely regular expressions. In most cases, this is the fastest and most effective approach, however, in certain circumstances this can be difficult, for example, if the text contains misspellings of words or in cases where we do not know the patterns to look for in the first place but could detect them once we saw them. This is a task for which neural network language models could prove to be useful, as they are capable of understanding the meaning of words based on the context they appear in. The main result of this thesis is a pipeline for implementing fact extraction tasks on medical texts. It uses EstMedBERT, a Bidirectional Encoder Representations from Transformers (BERT) model specifically pre-trained on Estonian medical texts, which can be fine-tuned to classify tokens using labelled data given by the user implementing the task. Having initially learned the task, the model will continue labelling new data under the supervision of the user, who will correct any mistakes and, using active learning, retrain the model. This is considered a human-in-the-loop approach for training neural networks. This approach could be a more effective solution to some fact extraction tasks in the medical field and implementing new tasks using this pipeline is easier to the user on a technical level, making it more accessible to people in medical domains. Moreover, in addition to providing the pipeline, as a result of this thesis, an example task has also been implemented using this approach and both the process and results have been analyzed.
Graduation Thesis language
Estonian
Graduation Thesis type
Master - Data Science
Supervisor(s)
Dage Särg, Raivo Kolde, Sven Laur
Defence year
2022
 
PDF