Extracting geographical semantics from online news articles
Abstract
Several news articles on the web contain geographical locations as significant elements. For the most part, these locations are not available in a format that is machine interpretable. The machine can read in the text of an article, but not derive an understanding of its content. This project aims to find techniques for detecting and extracting locations from the plain text online news articles. The project is limited to articles written in Norwegian, and published in the county of Hordaland. This is done by using methods from design science, for the development and evaluation. A prototype is implemented as a proof-of-concept system using the Clojure programming language. By text analysis, the prototype is able to find mentions of locations in articles. The prototype system have been made available as an open source project and as a Clojure library.