Ontology Based Information Extraction in the License Domain

Nilsen, Ken-Thomas

dc.contributor.author	Nilsen, Ken-Thomas	eng
dc.date.accessioned	2015-08-11T07:35:26Z
dc.date.available	2015-08-11T07:35:26Z
dc.date.issued	2015-06-01
dc.date.submitted	2015-06-01	eng
dc.identifier.uri	http://hdl.handle.net/1956/10258
dc.description.abstract	All computer users needs to deal with End User License Agreements(EULA). Every time we install software or sign up for a web service we are expected to read and accept such a legal agreement. For most users this is only a slightly annoying step in the process, and we have been conditioned through many years just to accept these texts unwittingly. These texts are often long and filled with legal jargon, and hence almost impossible for an interested lay person to understand. In this thesis I have explored the use of common natural language processing and knowledge ex- traction techniques in the domain of EULAs and license agreements. My project have included the development of an artifact that use these techniques, and then makes the data available through the usage of semantic technology. It extracts document structure, named entities, binary relations and definitions. I have built a classifier that use topic modeling to find binary relations. These topics are then used by the classifier to decide in what topic a given binary relation belongs. I have also experimented with the use of text search in ontologies to try and find the realization of a given binary relation in a specified ontology. The artifact is run on a specific EULA, and I evaluate the knowledge extracted from each of the techniques investigated. I have not tried to find the best existing implementation of a technique, but instead evaluated the kind of data extracted and what specific needs that arise in the domain of licenses. The extraction and representation of the structure of the license were a suc- cess, and I have used that extraction as a basis for a vocabulary that describes my extracted data. All extractions are related directly back to the text were it was extracted. This is because of the legal documents role in a judicial system. As the text decide the results in court, it is important to keep a reference back to the source document. Because of this my system can be viewed as a system that semantically enrich a text, but without reasoning about higher levels of knowledge. I conclude that extracting knowledge using common NLP and knowledge extraction tools is feasible and opens up for research into its use in document summarization and in facilitating comprehension of such legal texts. I also conclude that my classifier for binary relations has weak performance, but list a set of changes and prerequisites that would warrant further experimentation. I also conclude that we will need to take special steps in the construction of our ontologies for my experiment with using the built in comments and labels in an ontology to be viable.	en_US
dc.format.extent	1695826 bytes	eng
dc.format.mimetype	application/pdf	eng
dc.language.iso	eng	eng
dc.publisher	The University of Bergen	eng
dc.rights	Copyright the Author. All rights reserved	eng
dc.subject	EULA	eng
dc.subject	license agreements	eng
dc.subject	semantic technology	eng
dc.subject	binary relations	eng
dc.subject	document structure	eng
dc.subject	named entities	eng
dc.subject	ontology	eng
dc.subject	semantic web	eng
dc.title	Ontology Based Information Extraction in the License Domain	eng
dc.type	Master thesis	en_US
dc.description.version	publishedVersion
dc.description.localcode	INFO390
dc.description.localcode	MASV-INFO
dc.subject.nus	735115	eng
fs.subjectcode	INFO390

Files in this item

Name:: 135278341.pdf
Size:: 1.617Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Department of Information Science and Media Studies [847]

Show simple item record