FILT - Filtering Indexed Lucene Triples

Stuhr, Magnus

Stuhr, Magnus

Master thesis

Åpne

96685564.pdf (1.600Mb)

Permanent lenke

http://hdl.handle.net/1956/5893

Utgivelsesdato

2012-05-31

Metadata

Vis full innførsel

Samlinger

Department of Information Science and Media Studies [902]

Sammendrag

The Resource Description Framework (RDF) is the W3C recommended standard for data on the semantic web, while the SPARQL Protocol and RDF Query Language (SPARQL) is the query language that retrieves RDF triples by subject, predicate, or object. RDF data often contain valuable information that can only be queried through filter functions. The SPARQL query language for RDF can include filter clauses in order to define specific data criteria, such as full-text searches, numerical filtering, and constraints and relationships between data resources. However, the downside of executing SPARQL filter queries is the frequently slow query execution times. Due to the fact that SPARQL filter queries can retrieve information that non-filter SPARQL queries cannot, decreasing the query execution time of SPARQL filter queries will greatly enhance the efficiency of the SPARQL query language. This thesis presents a SPARQL filter query processing engine for conventional triplestores called FILT (Filtering Indexed Lucene Triples), which is built on top of the Apache Lucene framework for storing and retrieving indexed documents. The objective of FILT was to decrease the query execution time of SPARQL filter queries. This was evaluated by performing a benchmark test of FILT compared to the Joseki triplestore, focusing on two different use-cases; SPARQL regular expression filtering in medical data, and SPARQL numerical/logical filtering of geo-coordinates in geographical locations.

Utgiver

The University of Bergen