Prediction of Cell Counts from DNA Methylation

Name
Simo Pähk
Abstract
DNA methylation is an epigenetic factor that modulates gene expression. The close relationship between gene expression an cell differentiation serves as a basis for methylation-based cell mixture deconvolution — a method for determining the proportions of constituent cell types in a biological sample. Previous work has demonstrated its usefulness in predicting lymphocyte subtypes in blood samples, but has neglected TEMRA, a type of senescent lymphocyte associated with aging and autoimmune diseases. This thesis sets out to explore the feasibility of estimating the proportions of T cells in various stages of differentiation, including TEMRA, from methylation sequencing data using machine learning. The results show that while prediction accuracy is lower for TEMRA subtypes than for general subtypes such as T cells, it is nonetheless a viable approach for this task, especially since DNA sequencing is cheaper and more scalable than traditional laboratory methods for blood sample analysis.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Ahto Salumets
Defence year
2022
 
PDF