Supporting Internet Search by Search-Log Publishing

Name
Peeter Jürviste
Abstract
The main research problem of my thesis was engineering a new type of search task logging and publishing framework which would provide a better alternative for existing browser plug-in based methods. Right from the start, the proxy-based search task reporting system has been a complex engineering challenge involving code written in multiple programming languages, interactions planned across many software modules (some of which have already been existing large projects themselves), and a Linux operating system configured to ease the set-up process for the user. This was the decision process to make sure that this solution is reliable, extendible and maintainable in the future. My research goal was completed successfully. In my thesis, I proposed a proxy-based method for logging user search behaviour across different browsers and operating systems. I also compared it with an existing plug-in based Search Logger for Mozilla Firefox and other similar solutions. The idea of developing a proxy-based search task logging and publishing solution came from out of necessity, because the existing logging solution had significant problems with maintainability. The logs created by my solution are subsequently annotated by the user and made publicly available on a dedicated Internet blog called the Search Task Repository. Users can search against the already annotated and published Internet search logs. Ideally this would mean reduced complexity of search tasks for the users which in turn saves time. User studies to confirm this are still pending but there is confirmed interest from Tartu researchers as well as from one foreign university to use my solution in their search experiments. The proposed solution is comprised of two large units, which are the search task repository and the search task logging and publishing unit. The search task repository is a remote component, essentially a fairly simple WordPress blog, which enables search stories to be published automatically over XML-RPC protocol, search queries to be served, and search task logs to be displayed to the searcher. My logging system is configured as a VirtualBox virtual machine. It is much more complex, consisting of three sub-components: the main Web interface, the search task logger, and the Privoxy Web proxy specially configured for my needs. Logging can be started and stopped at a user's will in the main Web interface. What is more, this sub-component also gives them absolute control over what gets published online by providing an editing and annotating functionality for all search task data, both implicitly and explicitly logged. A comprehensive theoretical overview was given in my thesis about the state of the art, explaining basic related concepts in Information Retrieval and recent developments in Exploratory Search and search task logging systems. In contrast with existing browser plug-in based search task logging methods, my proposed proxy-based approach ensures platform and browser independence while also being very stable. By giving searcher's the opportunity to freely define and annotate their own search tasks, my search support solution is setting a new standard. In the final chapter, I conducted a thorough analysis about future work and presented my own vision about the future opportunities for this search support methodology. A modified architecture for more convenient laboratory experiments was outlined as an important task for the future. In conclusion, my proxy-based search task logging, editing and publishing framework can be extended further to log more JavaScript events. The search task repository is a large open area with lots of opportunities for future extensions.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Eero Vainikko, Ulrich Norbisrath
Defence year
2012
 
PDF