Predicting the Molecular Mechanisms of Genetic Variants

Name
Dzvenymyra-Marta Yarish
Abstract
Understanding the molecular pathways through which GWAS variants influence complex traits is essential for uncovering disease mechanisms and aiding target prioritization. Traditional molQTL mapping, commonly used to assign the variant's mode of action, often yields numerous false positives and struggles with low-frequency variants. To address this issue, we investigated the use of machine learning models to predict the mode of action (MoA) of variants. We compiled a dataset consisting of two classes of molQTLs: splicing QTLs and gene expression QTLs influenced by chromatin accessibility (caQTLs). We evaluated the performance of two deep learning models, Enformer and ChromBPNet, which represent different approaches to predicting regulatory activity, on a set of fine-mapped caQTLs, with ChromBPNet proving to be more precise. We then developed the MoA model, integrating classic genomic features with predictions from single-task deep learning models. This model achieved nearly 90\\% accuracy in distinguishing between the two QTL classes, surpassing the 80\\% accuracy of a classifier based on scores from a single large-scale foundational model.
Additionally, we applied the MoA model to score QTLs from the eQTL catalogue, identified by either gene expression or Leafcutter (commonly used to identify sQTLs) methods. Our analysis indicated that MoA model predictions aligned well with gene expression QTLs, whereas most Leafcutter QTLs were not classified as sQTLs.
In summary, this work introduces an original dataset for MoA model training and evaluation, and presents a proof-of-concept MoA model that effectively classifies GWAS variants into splicing QTLs and gene expression QTLs influenced by chromatin accessibility.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Kaur Alasoo
Defence year
2024
 
PDF