Prediction of geographic origin based on gene expression and genetic variation data analysis

Madis Kaasik
The aim of this thesis is to study, how much do gene expression levels or single nucleotide polymorphisms (SNPs) differ in different ethnical groups. Sample data is publicly accessible gene expression and SNP data, which is collected from americans with different ethincal origin. Statistical analysis software R is used for analysing this data. Thesis aims to give an overview of different statistical methods, machine learning algorithms and apply them on sample data. The end goal is to find out how precisely can origin be predicted using gene expression, genetic variability, gene expression and genetic variability and which classification method is best suited for origin determination.
Graduation Thesis language
Graduation Thesis type
Bachelor - Computer Science
Tauno Metsalu, Tatjana IljaĊĦenko
Defence year