Creating Prediction Models Based on Estonian COVID-19 Data

Sille Habakukk
The COVID-19 pandemic has strained healthcare systems all around the world. In order to keep up providing quality health care services, it is often necessary to determine which patients to prioritise. Predictive models can inform such medical decisions but did not exist for the novel COVID-19 at the pandemic’s beginning. Predictive models created during the first wave of the pandemic reported good predictions but, having been trained on small datasets, did not produce satisfactory results upon external validations on different datasets. Therefore these models also could not be put to use in Estonia.
This thesis aimed to use electronic health records to create prediction models that would predict COVID-19 outcomes well in the Estonian population. Prediction models were trained to predict whether or not a patient would need hospitalisation, be admitted to intensive care or die within 30 days of contracting the virus. The resulting models used the Random Forest algorithm, which is not standard for prediction models, but had stable performance when predicting adverse outcomes of COVID-19 and avoided over-fitting to the data. However the models’ absolute value predictions need to be calibrated to account for the disease course changing over time.
The created prediction models used a significant number of predictors, making the models more suitable for use in the public health policy creation process. The models would need to be independently externally validated before use. The prediction models show that good predictive performance is achievable using only registry data, without factoring in any symptoms.
Graduation Thesis language
Graduation Thesis type
Master - Data Science
Raivo Kolde
Defence year