Word sense disambiguation in webpages. Developing a program capable to disambiguate words with a website text as context
MetadataVis full innførsel
This master thesis investigated automatic methods of Word Sense Disambiguation (WSD) in HTML pages. The hypothesis was that HTML documents provide various disambiguation cues which are not normally present in general text, and which can enhance the quality of WSD. We tested several existing natural language processing toolkits which provide general WSD services, and compared these to our novel algorithms which were designed to take advantage of the HTML cues. The findings showed that our new algorithms outperformed state of the art general WSD implementations. In addition, our algorithm could provide a ranked list of potential disambiguations, which is useful in an example use case where users “tag” key words in a web page with the help of the disambiguating algorithm.