Clustering Methods for Interpreting Medical Data

Name
Egert Georg Teesaar
Abstract
The medical bills can be analyzed to identify disease trajectories. By applying
machine learning methods it is possible to find answers to questions, like which diagnoses occur together and from what these conditions arise. This study uses various clustering methods, like Bernoulli mixture models and autoencoders compression with K-means, to divide patient into groups based on the diagnoses they have received. The results of the models are visualized on the heatmaps showing how likely it is to encounter specific diagnoses in those groups. Also a guided hidden Markov model was used to form a lifelong disease path from the short segments of the different patients’ treatment. This provides a way to observe how certain conditions arise in different ages and allows to track the disease development over time. It found similar results, what had been previously reported in medical studies, like development of J35 from H65.
The models interpretability was also improved by using support vector machines as a feature selection method for I11. This way it was possible to get rid of all the diagnoses, which had no connection to I11 and only keep those contributing to the development of the disease. Result on the processed data also agreed with the medical findings, like I50 development from I11.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Sven Laur
Defence year
2020
 
PDF