Using Robust Rank Aggregation for Prioritising Autoimmune Targets on Protein Microarrays

Vitalii Peretiatko
Autoimmune diseases are very common in the modern world. More and more diseases associated with an autoimmune process. Autoimmune reaction is a process in which the immune system produces antibodies (autoantibodies) that attack organism’s own cells. Causes and mechanisms of autoimmune diseases are yet to be understood. One of the ways to study autoimmunity is to explore reasons why certain cells and particularly proteins were attacked by autoantibodies. To achieve this, many technologies have been developed and one of which is Protein microarray. This technology allows estimating the amount of autoantibodies in patient serum against 9000 unique human proteins. Consequently, applying methods of data analysis on this data, bioinformaticians might be able to identify proteins that attract prevalent amount of autoantibodies. Knowing these proteins, biologists could conduct experiments and formulate new hypotheses about mechanisms of work and appearance of autoimmune diseases. Common data analysis methods focused on how to select only the most reliably differing proteins between healthy and diseased groups. Moreover, ignoring the fact that in the case of an autoimmune disease - the repertoire of the affected proteins can differ greatly between patients. So even single cases of high protein reactivity may carry important information for understanding the mechanisms of disease. In this thesis, we propose to apply Robust Rank Aggregation algorithm as an adaptive method to identify a wide repertoire of reactive proteins. We compared expediency and effectiveness of the classical methods of analysis, method recently applied by biologists and RRA on synthetic and real data. Experiments on synthetic data sets with known reactive proteins show that RRA outperforms these methods while also being more robust to incorporated noise. Applying RRA on real data and conducting an enrichment analysis on lists of reactive proteins for each method, we got comparable numbers of proteins overrepresented in the classes associated with biological and immune responses.
Graduation Thesis language
Graduation Thesis type
Master - Software Engineering
Dmytro Fishman, Elena Sügis
Defence year