Memory-Efficient Fast Shortest Path Distance Estimation in Large Graphs

Name
Volodymyr Floreskul
Abstract
Shortest path computation is one of the most critical primitives in graph algorithms. In large graphs there is often a need to use approximate methods as simple exact algorithms require unacceptably large amounts of time. One family of techniques that is simple and scalable at the same time is based on upper bound distance approximation using a fixed set of selected nodes called landmarks. If we know the distances from all nodes in the graph to a landmark u, the distance between a pair of nodes s and t can be approximated using the triangle inequality: d (s, t) <= d (s, u) + d (u, t). In a similar way it is also possible to calculate approximated shortest paths. The obtained accuracy can be increased by using a set of k landmarks, but this leads to linear increase of memory usage and preprocessing time with sublinear approximation error reduction. In this work we introduce a simple but powerful modification to this approach that we call pruned landmark trees. The idea of this improvement is based on the fact that in majority of cases it is sufficient to keep the distances only to the r closest landmarks rather than the whole set of landmarks, where r is much smaller than k. We describe three shortest path approximation algorithms that use the proposed modification. The performance of the presented methods was tested on big real-world social network graphs including DBLP, Orkut, Twitter and Skype. The obtained results have been compared against the results of regular landmark-based techniques. The comparison showed that pruned landmark tree-based algorithms can be used to significantly reduce the used memory consumption while achieving both comparable accuracy and query execution time.
Graduation Thesis language
English
Graduation Thesis type
Master - Software Engineering
Supervisor(s)
Konstantin Tretyakov
Defence year
2013
 
PDF