Towards Building Knowledge Graphs from Natural Language Text with Open Relation Extraction and Semantic Disambiguation

Johnsen, Dag Vegard Kollstrøm

Johnsen, Dag Vegard Kollstrøm

Master thesis

Åpne

master thesis (559.6Kb)

Permanent lenke

http://hdl.handle.net/1956/19999

Utgivelsesdato

2019-03-23

Metadata

Vis full innførsel

Samlinger

Department of Information Science and Media Studies [858]

Sammendrag

Natural language text, from messages on social media to articles in newspapers, constitutes a significant portion of the content available on the Web. These texts are readable by humans, but cannot easily be used for advanced queries and reasoning by machines. Thus, the automated conversion of natural language text into a formal representation that is machine-readable is an important goal. The extraction of knowledge graphs from text is of particular importance in the context of the Semantic Web and Linked Open Data initiatives. This thesis describes the exploratory, example-driven development of an approach to knowledge graph extraction from natural language texts through the use of Open Relation Extraction systems, which are capable of extracting facts from texts in the form of relational triples in an efficient, domain-independent manner. The intuition is that these triples can be disambiguated and converted into machine-readable statements. This approach is partially implemented and in turn qualitatively assessed on the text domain of the lead paragraphs of newspaper articles, which express facts about notable entities. Solutions are discussed for many of the problems discovered through the implementation and assessment. The results indicate that Open Relation Extraction shows promise as an underlying technique for knowledge graph extraction from natural language text.

Utgiver

The University of Bergen