Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Clustering Methods for Interpreting Medical Data

Name

Egert Georg Teesaar

Abstract

The medical bills can be analyzed to identify disease trajectories. By applying
machine learning methods it is possible to find answers to questions, like which diagnoses occur together and from what these conditions arise. This study uses various clustering methods, like Bernoulli mixture models and autoencoders compression with K-means, to divide patient into groups based on the diagnoses they have received. The results of the models are visualized on the heatmaps showing how likely it is to encounter specific diagnoses in those groups. Also a guided hidden Markov model was used to form a lifelong disease path from the short segments of the different patients’ treatment. This provides a way to observe how certain conditions arise in different ages and allows to track the disease development over time. It found similar results, what had been previously reported in medical studies, like development of J35 from H65.
The models interpretability was also improved by using support vector machines as a feature selection method for I11. This way it was possible to get rid of all the diagnoses, which had no connection to I11 and only keep those contributing to the development of the disease. Result on the processed data also agreed with the medical findings, like I50 development from I11.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Sven Laur

Defence year

2020

PDF

UT Institute of Computer Science Graduation Theses Registry

Clustering Methods for Interpreting Medical Data