Automatic Labelling of Point Clouds Using Image Semantic Segmentation

Markus Loide
Autonomous driving is often seen as the next big breakthrough in artificial intelligence. Autonomous vehicles use a variety of sensors to obtain knowledge from the world, for example cameras and LiDARs. LiDAR provides 3D data about the surrounding world in the form of a point cloud. New deep learning models have emerged that allow for learning directly on point clouds, but obtaining labelled data for training these models is difficult and expensive. We propose to use semantically segmented camera images to project labels from 2D to 3D, therefore enabling the use of cheaper ground truth data to train the aforementioned models. Furthermore, we evaluate the use of mature 2D semantic segmentation models to automatically label vast amounts of point cloud data. This approach is tested on the KITTI dataset, as it provides corresponding camera and LiDAR data for each scene. The DeepLabv3+ semantic segmentation model is used to label the camera images with pixel-level labels, which are then projected onto the 3D point cloud and finally a PointNet++ model is trained to do segmentation from point clouds only. Experiments show that projected 2D labels can be learned reasonably well by PointNet++. Evaluating the results with 3D ground truth provided with KITTI dataset produced promising results, with accuracy being high for detecting pedestrians, but mediocre for cars.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Tambet Matiisen
Defence year