Finding motifs from short peptides

Name
Mari-Liis Kruup
Abstract
The goal of this thesis is to develop a workflow that could find groups of similar peptides from a set of short peptides and represent these groups as motifs. This workflow could be later used to discover motifs from peptides of different individuals to find similarities between individuals with the same disease. Different commonly known methods, bioinformatics tools and additional scripts are combined to assemble the workflow of finding motifs from the peptides. The developed workflow is based on hierarchical clustering, which divides the input peptides into groups based on their similarities. The found groups are modified to get groups that each would contain only one unique motif. Motifs of the final groups are then extracted and represented as sequence logos and regular expressions. The found motifs are scored to give information about how well every motif describes specifically that peptide group. The developed workflow was assembled and tested on one individual. The testing was successful and 71.19% of the inserted 277 166 peptides were divided into 46 motif groups, of which 43 had very good scores. In the future, this workflow can be used to analyze different individuals in order to find similar motifs between individuals with the same disease.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Meelis Kull, Balaji Rajashekar, Sven Laur
Defence year
2013
 
PDF