Clustering Faces from the National Archives of Estonia

Organisatsiooni nimi
Jan Aare van Gent
The goal of this thesis was to create a proof-of-concept pipeline for face clustering and face label prediction with the main goal of easing facial image labeling for the National Archives of Estonia. The pipeline was written entirely in the programming language Python and the algorithms for doing facial image processing and clustering were obtained from popular open source libraries, such as Dlib, Scikit-Learn and their dependencies.The resulting pipeline’s clustering quality was then evaluated by comparing the results tolabeled data, also known as the ground truth. In addition, each stage of the pipeline’sperformance and memory consumption were profiled with two different data sets to findthe main bottlenecks. All evaluations were done on the HPC platform of University of Tartu.The resulting pipeline consisted of the following stages: face detection, shape prediction, face alignment, feature extraction, feature clusterin. The face detection stage uses a modified MMOD object detec-tion algorithm with a pre-trained model implemented in Dlib. For shape prediction,Dlib’s 5-point pre-trained shape predictor was used. The face alignment stage uses affine transformations to transform the facial image such that the 5 face landmarks detected are centered. The aligned face is then fed into a pre-trained CNN that generates a 128-dimensional feature vector, also called a deep feature or embedding. The DBSCAN clustering algorithm implemented in Scikit-learn was then used to cluster the embeddings. The aim of the cluster quality evaluation was to find the best clustering parameters to get clustering results with• 100% recall with the maximal precision• 100% precision with the maximal recall•The best balance between recall and precision using different metrics like V-score,the adjuster Rand index and the adjusted mutual information index.A 100% recall means that the facial images of a particular individual are all in the samecluster and a 100% precision means that facial images of different individuals do notappear in the same cluster. Since the best performing clustering algorithm was DBSCAN,only the neighborhood area radiusεwas varied. The evaluation suggested that the mostoptimal values ofεfor this particular pipeline setup and image data are 0.30, 0.37 and0.44 (for 100% precision, best balance and 100% recall respectively).The results of profiling showed that the most memory consuming part of the pipelinewas face detection with the MMOD face detector since it used up around 600 MiB ofRAM on average. However, when clustering hundreds of thousands of images, clusteringwill most likely be the main bottleneck in terms of required CPU time.
Although the face clustering pipeline developed for this thesis is not productionready since it suffers from performance issues when processing hundreds of thousandsof images, it provides a starting point and an overview of what could be accomplishedusing well tested open source implementations for the various parts of the pipeline. Thedescription of each stage of the pipeline in this thesis also provides a basic theoreticalunderstanding of the problems relating to clustering feature vectors of human faces. Anyfuture work on this topic should focus on how to make better face label predictions forthe user of the program, using labeled image data
Lõputöö kaitsmise aasta
Tambet Matiisen
eesti keel, inglise keel
Nõuded kandideerijale

Kandideerimise kontakt

Jan Aare van Gent
PDF kuulutus