Machine learning / Statistical modeling: Modeling the CRISPR/Cas9 genome editing system

Organisatsiooni nimi
ATI
Kokkuvõte
The CRISPR/Cas9 system has revolutionized research in cell biology. DNA can now be edited easily and accurately, enabling experiments ranging from understanding gene function and establishing mutations that cause disease, to correcting inherited genetic defects. The system relies on targeting Cas9 enzyme that generates breaks in DNA, to chosen locations in the genome using a guide RNA (gRNA). However, as all locations cannot be targeted equally well, many gRNAs are used to edit one gene, which limits the scale of the experiments that interrogate large numbers of genes at once. This motivates the need for better understanding of the factors that aid targeting.

The features that determine the efficacy of Cas9 function have only been tested to some extent. For example, it is known that DNA sequence composition and its accessibility play a role [1,2]. Whether the editing results in a change for a cell further depends on the expression of the targeted gene and exon, conservation of the region across evolution, the protein domain edited, and the genetic background of the line. However, the extent of the influence of these factors remains poorly characterized for now, and it is difficult to predict whether a newly designed gRNA will perform well in a genome editing experiment.

The aim of this project is to build a predictive model of genome editing outcome, focusing on the properties of the targeted region. As a first step, the editing readouts can be modeled as in [2], and the model expanded to include the abundant additional genomic information. Alternatively, other machine learning approaches can be tested. The project will be in collaboration with the Genetic Screens of Cellular Traits group at the Wellcome Trust Sanger Institute, where new data for validating the findings can be generated.

This cutting edge project is well suited for someone with experience in (or desire to acquire) machine learning or statistical modeling methods, and basic data science skills of obtaining, cleaning, and visualising data. Knowledge of genomics is beneficial.



References:

Smith, Justin D., et al. "Quantitative CRISPR interference screens in yeast identify chemical-genetic interactions and new rules for guide RNA design." Genome biology 17.1 (2016). http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0900-9


Li, W. et al. “MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens” Genome Biology 15:554 (2014). https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4
Lõputöö kaitsmise aasta
2016-2017
Juhendaja
Leopold Parts
Suhtlemiskeel(ed)
eesti keel, inglise keel
Nõuded kandideerijale
This cutting edge project is well suited for someone with experience in (or desire to acquire) machine learning or statistical modeling methods, and basic data science skills of obtaining, cleaning, and visualising data. Knowledge of genomics is beneficial.
Tase
Bakalaureus, Magister
Märksõnad
#machine_learning #statistics #analysis #biology

Kandideerimise kontakt

 
Nimi
Leopold Parts
Tel
E-mail
leopold.parts@ut.ee