Combining Support Vector Machines to Predict Novel Angiogenesis Genes

Kaur Alasoo
Angiogenesis is the process of growing new blood vessels. It is part of normal bodily functions like wound healing, but it also plays an important role in cancer development. Without angiogenesis, tumors would not be able to grow larger than 1-2 millimeters in diameter due to the lack of oxygen and nutrients. However, only a part of the genes involved in angiogenesis are known. In this work, we proposed a new Comb-SVM machine learning method to predict new members to the positive class, that does not require a clearly defined negative examples. The idea is to train multiple Support Vector Machines (SVMs) using known genes as positive samples and various randomly selected sets of genes as negative examples. The multiple SVMs are then used to separately classify all remaining human genes and the results are finally aggregated using a rank aggregation algorithm. The outcome is a list of genes ranked according to their similarity to known input genes. We applied this method to 341 known angiogenesis genes. Experiments were conducted on a large Affymetrix microarray gene expression matrix consisting of 5732 experiments and 22283 probe sets obtained from ArrayExpress. We compared Comb-SVM to many other state-of-the-art approaches. According to cross-validation experiments, our method outperformed most of the existing methods when looking at areas under Receiver Operator Characteristic and Precision-Recall curves. We also determined that our method gave significantly more stable results than the second best approach. Finally, we verified the biological relevance of the predicted genes by searching the literature and Gene Ontology.
Graduation Thesis language
Graduation Thesis type
Bachelor - Computer Science
Hedi Peterson, Phaedra Agius
Defence year