• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Faculty of Social Sciences
  • Department of Information Science and Media Studies
  • Department of Information Science and Media Studies
  • View Item
  •   Home
  • Faculty of Social Sciences
  • Department of Information Science and Media Studies
  • Department of Information Science and Media Studies
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Ontology Based Information Extraction in the License Domain

Nilsen, Ken-Thomas
Master thesis
Published version
Thumbnail
View/Open
135278341.pdf (1.617Mb)
URI
http://hdl.handle.net/1956/10258
Date
2015-06-01
Metadata
Show full item record
Collections
  • Department of Information Science and Media Studies [738]
Abstract
All computer users needs to deal with End User License Agreements(EULA). Every time we install software or sign up for a web service we are expected to read and accept such a legal agreement. For most users this is only a slightly annoying step in the process, and we have been conditioned through many years just to accept these texts unwittingly. These texts are often long and filled with legal jargon, and hence almost impossible for an interested lay person to understand. In this thesis I have explored the use of common natural language processing and knowledge ex- traction techniques in the domain of EULAs and license agreements. My project have included the development of an artifact that use these techniques, and then makes the data available through the usage of semantic technology. It extracts document structure, named entities, binary relations and definitions. I have built a classifier that use topic modeling to find binary relations. These topics are then used by the classifier to decide in what topic a given binary relation belongs. I have also experimented with the use of text search in ontologies to try and find the realization of a given binary relation in a specified ontology. The artifact is run on a specific EULA, and I evaluate the knowledge extracted from each of the techniques investigated. I have not tried to find the best existing implementation of a technique, but instead evaluated the kind of data extracted and what specific needs that arise in the domain of licenses. The extraction and representation of the structure of the license were a suc- cess, and I have used that extraction as a basis for a vocabulary that describes my extracted data. All extractions are related directly back to the text were it was extracted. This is because of the legal documents role in a judicial system. As the text decide the results in court, it is important to keep a reference back to the source document. Because of this my system can be viewed as a system that semantically enrich a text, but without reasoning about higher levels of knowledge. I conclude that extracting knowledge using common NLP and knowledge extraction tools is feasible and opens up for research into its use in document summarization and in facilitating comprehension of such legal texts. I also conclude that my classifier for binary relations has weak performance, but list a set of changes and prerequisites that would warrant further experimentation. I also conclude that we will need to take special steps in the construction of our ontologies for my experiment with using the built in comments and labels in an ontology to be viable.
Publisher
The University of Bergen

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit