Discretization and Analysis of Laboratory Tests

Name
Annika Talvet
Abstract
When interpreting the results of patients' clinical analyses, reference ranges are important as they define the range within which a measurement result could fall for a healthy individual. These ranges can depend on age and gender, but may also vary depending on the methodology used in a particular laboratory. Using analysis results that are discretized based on reference ranges simplifies data analysis and model training. However, analysis results may be associated with incorrect LOINC codes or units of measurement.
The aim of this Master's thesis is to identify analyses and reference ranges grouped incorrectly or with incorrect units. Additionally, it aims to investigate whether discretized analysis results are beneficial for predicting medical events and if there is a difference in prediction accuracy using different discretization methods.
In order to identify incorrectly grouped analysis results, the data was clustered using a Gaussian mixture model. To assess the predictive capability of discretized results, dependencies between the occurrence of medical events and differently discretized measurements, as well as measurement facts, were examined and models were trained to predict the occurrence of medical events.
The results revealed that there is no significant difference in the prediction accuracy between models using different inputs. This suggests that in predicting medical events, the occurrence of measurement is as important as the discretized analysis result.
Graduation Thesis language
Estonian
Graduation Thesis type
Master - Data Science
Supervisor(s)
Sven Laur
Defence year
2024
 
PDF