Detection of Near-Duplicates Using Error-Correcting Codes

Name
Gerli Viikmaa
Abstract
The detection of near-duplicate items from a large set is a problem faced in many fields. This paper constructs and analyses two algorithms for finding similar pairs from an input dataset. It shows that these algorithms are applicable and efficient in the domain of DNA sequences.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Sven Laur
Defence year
2014
 
PDF