FILT - Filtering Indexed Lucene Triples
Master thesis
Permanent lenke
http://hdl.handle.net/1956/5893Utgivelsesdato
2012-05-31Metadata
Vis full innførselSamlinger
Sammendrag
The Resource Description Framework (RDF) is the W3C recommended standard for data on the semantic web, while the SPARQL Protocol and RDF Query Language (SPARQL) is the query language that retrieves RDF triples by subject, predicate, or object. RDF data often contain valuable information that can only be queried through filter functions. The SPARQL query language for RDF can include filter clauses in order to define specific data criteria, such as full-text searches, numerical filtering, and constraints and relationships between data resources. However, the downside of executing SPARQL filter queries is the frequently slow query execution times. Due to the fact that SPARQL filter queries can retrieve information that non-filter SPARQL queries cannot, decreasing the query execution time of SPARQL filter queries will greatly enhance the efficiency of the SPARQL query language. This thesis presents a SPARQL filter query processing engine for conventional triplestores called FILT (Filtering Indexed Lucene Triples), which is built on top of the Apache Lucene framework for storing and retrieving indexed documents. The objective of FILT was to decrease the query execution time of SPARQL filter queries. This was evaluated by performing a benchmark test of FILT compared to the Joseki triplestore, focusing on two different use-cases; SPARQL regular expression filtering in medical data, and SPARQL numerical/logical filtering of geo-coordinates in geographical locations.