Use of Wikipedia content, structural connections and usage statistics to generate context aware query augmentation in a topical search engine
MetadataVis full innførsel
This thesis presents the TCSearch2, a Master's project. The thesis studies different approaches to bridging the gap between user expectations and existing search engine result and their impact on the quality of the results. Four search engines were developed to evaluate the methods proposed by this thesis. This was achieved by using publicly available data from the online encyclopedia - Wikipedia. Content, structure, such as links, and usage statistics from Wikipedia were extracted and applied in the process of creating the general knowledge base for topic identification. The knowledge base is used for the query augmentation process. To bridge the mentioned gap, the search engines developed needed some intelligent capabilities; those intelligent capabilities are contextual topic identification of user input. Users have access to directly work with the augmented query terms and weight of the terms. An online public prototype of the TCSearch2 project will be deployed by 2013. Two types of studies have been conducted to evaluate the developed search engines: a qualitative study with seven test subjects in a laboratory evaluation, with a total duration of 21 hours, and a quantitative search simulation, with a total of 30 different queries. In the qualitative study, the subjects' usage data and feedback were analyzed. In the quantitative evaluation, the developed search engines were compared to existing search engines, including Google and Wikipedia's search engine. The studies show that the proposed methods of this thesis reduce the gap between users' expectations and search engine results.