Going Beyond U-Net: Assessing Vision Transformers for Semantic Segmentation in Microscopy Image Analysis

Name
Illia Tsiporenko
Abstract
Segmentation is one of the crucial steps in biomedical image analysis. Many approaches were developed over the past decade to segment biomedical images, ranging from classical segmentation algorithms to advanced deep learning models, with U-Net being one of the most prominent. Recently, a new class of models has appeared - transformers, which promise to enhance the segmentation process of biomedical images. We explore the efficacy of the well-established U-Net model and newer transformer-based models, including UNETR, Segment Anything Model, and Swin Transformer, across various image modalities such as electron microscopy, brightfield, histopathology, and phase-contrast. Additionally, we identified several limitations in the original Swin Transformer architecture and addressed those via custom modifications to the original model to optimise its performance. Our results indicate that these modifications improve segmentation performance compared to the classical U-Net model as well as to the original unmodified Swin. While results show that transformer models hold promise, especially in handling complex image structures, our practical experience shows that deploying these models can be difficult. This work compares popular transformer-based models against U-Net and shows that with thoughtful modifications, the efficiency and applicability of transformer models can be enhanced, paving the way for their future integration into microscopy image analysis tools.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Dmytro Fishman, Pavel Chizhov
Defence year
2024
 
PDF