Clustering Faces from the National Archives of Estonia

Jan Aare van Gent
The National Archives of Estonia contains hundreds of thousands of images, where many of the people are unlabeled. To assist manual labeling, a label prediction system is proposed that detects faces from images and uses an artificial neural network to generate a representation for each face in the form of a feature vector. These feature vectors can be clustered using euclidean distances since similar faces produce feature vectors that are close to each other in feature space.
In this thesis, a proof-of-concept of the pipeline for face clustering and label prediction was created. Clustering results were evaluated with different clustering parameters and stages of the pipeline were profiled in terms of time taken for execution and average memory consumption.
A survey of different methods for face detection, shape prediction, face alignment, feature extraction, clustering, and labeling was also presented.
Graduation Thesis language
Graduation Thesis type
Bachelor - Computer Science
Tambet Matiisen
Defence year