W3C

SPARQL New Features and Rationale

W3C Working Draft 2 July 2009

This version:
http://www.w3.org/TR/2009/WD-sparql-features-20090702/
Latest version:
http://www.w3.org/TR/sparql-features/
Editors:
Kjetil Kjernsmo, Computas AS
Alexandre Passant, DERI Galway at the National University of Ireland, Galway, Ireland


Abstract

SPARQL is a query language for RDF data on the Semantic Web with formally defined meaning. This document is a simple introduction to the new features of the language, including an explanation of its differences with respect to the previous SPARQL Query Language Recommendation [SPARQL/Query 1.0]. It also presents the requirements that have motivated the design of the main new features, and their rationale from a theoretical and implementation perspective.

Status of this Document

May Be Superseded

This is a First Public Working Draft of a feature requirement documents for the continued SPARQL language development. This document is expected to change in response to public input and working group decisions.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This Working Draft has undergone several changes since the version of 02 December 2008

Comments are solicited

The SPARQL Working Group seeks public feedback on this Working Draft. Please send your comments to [email protected] (public archive). If possible, please offer specific changes to the text that would address your concern.

No Endorsement

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents


1 Introduction

This document provides an overview of the main new features of SPARQL and their rationale. This is an update to SPARQL adding several new features that have been agreed by the SPARQL WG. These language features were determined based on real applications and user and tool-developer experience.

1.1 List of Features

The following features have been agreed by the SPARQL WG. These features have been grouped into Required and Time-permitting features as follows.

Required features
Time-permitting features

1.2 Goals and structure of the document

In the remainder of this document we will present the new features according to the nomenclature agreed by the Working Group:

Each feature is described in a common pattern as follows:

Motivations
a brief sentence explaining why the new feature was added
Description
a more complete description of the feature
Existing implementation(s)
a list of existing implementations for the proposed feature and example syntax used in the implementation
Related discussions
links to related discussions of the WG regarding the feature (mainly issues raised) and
Status
the status of the feature, i.e. either required or time-permitting.

This current working draft details only one required features, but motivation and a description is also provided for the time-permitting features.

2 SPARQL/Query 1.1

2.1: Aggregate functions

2.1.1 Motivations

Aggregate functions allow operations such as counting, numerical min/max/average and so on, by operating over columns of results. They are currently not taken into account in SPARQL and then require additional scripting to parse query results and get these informations, e.g. the number of triples that satisfy a particular statement. Hence, a language extension is needed.

2.1.2 Description

In SPARQL/Query 1.0 (original SPARQL), query patterns yield a solution set (effectively a table of solutions) from which certain columns are projected and returned as the result of the query. Aggregates provides the ability to partition a solution set into one or more groups based on rows that share specified values, and then to create a new solution set which contains one row per aggregated group. Each solution in this new aggregate solution set may contain either variables whose values are constant throughout the group or aggregate functions that can be applied to the rows in a group to yield a single value. Common aggregate functions include COUNT, SUM, MIN, and MAX.

Aggregate functions are commonly required to perform a slew of application and data-analysis tasks, such as:

Applications can typically take a SPARQL/Query 1.0 solution set and calculate aggregate values themselves. Enabling SPARQL engines to calculate aggregates, however, results in moving work from the application to the SPARQL engine, and will usually result in significantly smaller solution sets being returned to the application.

2.1.3 Existing implementation

The following systems are known by the WG at the time of publication to support one or more aggregate functions:

Several aggregate functions are widely implemented, and implementations tend to project out results like for example:

SELECT COUNT(?person) AS ?alices
WHERE {
  ?person :name "Alice" .
}

return the number of times the a triple of the form _ :name "Alice" appears in the source data.

SELECT AVG(?value) AS ?average
WHERE {
  ?good a :Widget ;
        :value ?value .
}

2.1.4 Related discussions

Related issues raised by the WG:

2.1.5 Status

This feature is considered as Required by the WG.

2.2: Subqueries

2.2.1 Motivations

It is sometimes necessary to nest the results of a query within another query. It currently requires to get the results of a first query, parse them with dedicated scripts, and then launch the second query. The Subquery feature would allow to do such nesting in a single SPARQL query.

2.2.2 Description

In SPARQL/Query 1.0 (original SPARQL), to nest the result of a first query into another one, one has to rely on dedicated script(s) and run separate queries. For instance, to identify all the people that Alice knows and a single name for each of them, the following script should be done (in PHP, assuming the do_query function allows to run a SPARQL query and get the results as an array of PHP objects)

$query = "
SELECT ?person WHERE {
 :Alice :knows ?person .
}";
$res = do_query($query);  
foreach ($res as $r) {
  $person = $r->person->value;
  $query = "SELECT ?name WHERE {
    ?person foaf:name ?name .
  } LIMIT 1";    
}
  

The Subquery feature will provide a way to nest the results of a query within another query. That feature could be used, for instance, in the following use cases:

The query form of subqueries has not yet been decided by the WG (see issues below).

2.2.3 Existing implementation(s)

The following implementations are known by the WG at the time of publication to provide a way to run subqueries:

For instance, the following query is possible in ARQ, and is equivalent to the script mentioned before.

SELECT ?person ?name WHERE {
  :Alice foaf:knows ?person .
  { 
    SELECT ?name WHERE { 
      ?person foaf:name ?name 
    } LIMIT 1 
  }
}

2.2.4 Related discussions

Related issues raised by the WG:

2.2.5 Status

This feature is considered as Required by the WG.

2.3: Negation

2.3.1 Motivations

In SPARQL/Query 1.0 (original SPARQL), Negation by failure is possible by combining OPTIONAL, FILTER and !BOUND. It is yet difficult to write and can be a burden for learning and using SPARQL. Hence, dedicated language constructs for expressing negation are desired, as users requested, which the WG agrees with.

2.3.2 Description

Various tasks, such as data validation and social network analysis, can require the checking of whether certain triples do or don't exist in the graph. Checking the absence of triples is a form of negation, called Negation by failure (since it checks if a pattern does not match, and not if it does not exist following the open-world assumption) and is already possible in SPARQL/Query 1.0 (original SPARQL), using FILTER, OPTIONAL and !BOUND(), as follows (to retrieve the ?name of ?x for which no foaf:knows value exists, i.e. identify the name of people who do not know anyone).

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE { ?x foaf:givenName  ?name .
	OPTIONAL { ?x foaf:knows ?who } .
	FILTER (!BOUND(?who)) 
} 

Yet, this is not very intuitive to write and learn for users nor does it cater for efficient implementations of negation. Hence, the Negation feature to be included will provide the support for testing the absence of a match to a query pattern. Negation can be used in the following use cases:

The feature would introduce a new operator into the algebra or a new function for filters. Any existing queries do not use these operators and are therefore unaffected.

2.3.3 Existing implementation(s)

The following implementations are known by the WG at the time of publication to support Negation by failure:

The following example uses SeRQL's MINUS syntax to find the names of all people that do not know anyone, in a similar way to the previous query

SELECT x
  FROM {x} foaf:givenName {name}
MINUS
SELECT x
  FROM {x} foaf:givenName {name} ;
    foaf:knows {who}
USING NAMESPACE foaf = <http://xmlns.com/foaf/0.1/>

The following example uses the UNSAID syntax:

    
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?x
  WHERE { ?x foaf:givenName ?name
          UNSAID { ?x foaf:knows ?who }
        }
  

2.3.4 Related discussions

Related issues raised by the WG:

2.3.5 Status

This feature is considered as Required by the WG.

2.4: Project expressions

2.4.1 Motivations

Being able to return the values of expressions over result bindings, rather than just RDF terms in the store.

2.4.2 Description

In SPARQL/Query 1.0 (original SPARQL), projection queries (SELECT queries) may only project out variables bound in the query. Because variables can only be bound via triple pattern matching, there is no way to project out values that are not matched in the underlying RDF data set. Projecting expressions represents the ability for SPARQL SELECT queries to project any SPARQL expression, rather than only variables. A projected expression might be a variable, a constant URI, a constant literal, or an arbitrary expression (including function calls) on variables and constants. Functions could include both SPARQL built-in functions and extension functions supported by an implementation.

There are many use cases that motivate the ability to project expressions rather than just variables in SPARQL queries. In general, the motivation is to return values that do not occur in the graphs that comprise a query's RDF data set. Specific examples include:

TODO: More mention should be made of the connection with subqueries, as the two can be used together to answer many usecases.

2.4.3 Existing implementation(s)

The following systems are known by the WG at the time of publication to support some uses of project expressions:

We wish to find names, and whether the person is over 18.

 SELECT ?name (?age > 18) AS over18
 WHERE {
   ?person :name ?name ;
           :age ?age .
 }

Another example, we wish to find the full name of everyone who is interested in trees.

 PREFIX foaf: <http://xmlns.com/foaf/0.1/>

 PREFIX : <http://www.example.org/>
 
 SELECT fn:string-join(?givenName, ' ', ?surname) AS ?fullName
 WHERE {
   ?person foaf:givenname ?givenName ;
           foaf:surname ?surname ;
           foaf:interest :trees .
 }

This example has made use of a concatenation function from XPath-Functions. Which functions will be available for value construction in SPARQL is an open issue that will be dealt with on a time-permitting basis.

To return an RDF graph where the first and family names are concatenated to a full name we can use a query similar to the SELECT example from the previous query as a subquery and use a project expression like this:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 CONSTRUCT { ?x foaf:name ?fullName }
 WHERE {
   { SELECT fn:string-join(?gn, " ", ?sn) AS ?fullName
     WHERE { foaf:givenname ?gn ; foaf:surname ?sn . } }
 }

TODO: It should be established whether any implementations support both subqueries and fn:string-join()

2.4.4 Related discussions

The WG has noted that project expressions

2.4.5 Status

This feature is considered as Required by the WG.

2.5 Query language syntax

2.5.1 Motivation

Certain limitations of the SPARQL/Query 1.0 language syntax cause unnecessary barriers for learning and using SPARQL.

2.5.2 Description

Time-permitting, the SPARQL Working Group will consider extending SPARQL/Query's syntax to include:

2.5.5 Status

This feature is considered as time-permitting only by the WG.

2.6 Property paths

2.6.1 Motivation

Many classes of query over RDF graphs require searching data structures that are hierarchical and involve arbitrary-length paths through the graphs. Examples include:

2.6.2 Description

SPARQL/Query 1.0 can express queries over fixed-length paths within RDF graphs. SPARQL/Query 1.0 can also express queries over arbitrary but bounded-length paths via repeated UNION constructs. SPARQL/Query 1.0 cannot express queries that require traversing hierarchical structures via unbounded, arbitrary-length paths.

Time-permitting, the SPARQL Working Group will define the syntax and semantics of property paths, a mechanism for expressing arbitrary-length paths of predicates within SPARQL triple patterns.

2.6.5 Status

This feature is considered as time-permitting only by the WG.

2.7 Commonly Used SPARQL Functions

2.7.1 Motivation

Many SPARQL implementations support functions beyond those required by the SPARQL/Query 1.0 specification. There is little to no interoperability between the names and semantics of these functions for common tasks such as string manipulation.

2.7.2 Description

Time-permitting, the SPARQL WG will define URIs and semantics for a set of functions commonly supported by existing SPARQL implementations.

See Working Group issue: ISSUE-2 - http://www.w3.org/2009/sparql/tracker/issues/2

2.7.5 Status

This feature is considered as time-permitting only by the WG.

2.8 Basic Federated Query

2.8.1 Motivation

SPARQL is a concise query language to retrieve and join information from multiple RDF graphs via a single query. In many cases, the different RDF graphs are stored behind distinct SPARQL endpoints.

2.8.2 Description

Federated query is the ability to take a query and provide solutions based on information from many different sources. It is a hard problem in its most general form and is the subject of continuing (and continuous) research. A building block is the ability to have one query be able to issue a query on another SPARQL endpoint during query execution.

Time-permitting, the SPARQL Working Group will define the syntax and semantics for handling a basic class of federated queries in which the SPARQL endpoints to use in executing portions of the query are explicitly given by the query author.

2.8.5 Status

This feature is considered as time-permitting only by the WG.

3: Service description

3.1 Motivations

Given the variety of SPARQL implementations, and differences in datasets and extension functions, a method of discovering a SPARQL endpoint's capabilities and summary information of its data in a machine-readable way is needed.

3.2 Description

Many SPARQL implementations support a variety of SPARQL extensions (many proposed here for standardization), extension functions (for use in FILTERs), and different entailment regimes. Moreover, the differences in datasets provided by SPARQL endpoints is often hard to grasp without some existing knowledge of the underlying data. This proposal suggests that these differences may be described by the endpoints themselves, detailing both (1) the capabilities of the endpoint and (2) the data contained in the endpoint.

The Service description features can be used in the following uses-cases:

3.3 Existing implementation(s)

The following services are known by the WG at the time of publication to support the Service description feature

The following service description is an example of what is provided when querying powered by RDF::Query using about=1 HTTP parameters, e.g. http://example.org/sparql?about=1.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sd: <http://darq.sf.net/dose/0.1#> .
@prefix saddle: <http://www.w3.org/2005/03/saddle/#> .
@prefix sparql: <http://kasei.example/2008/04/sparql#> .
@prefix void: <http://rdfs.org/ns/void#> .
[] a sd:Service ;
  rdfs:label "SPARQL Endpoint for example.org" ;
  sd:url <http://example.org/sparql> ;
  sd:totalTriples 12729 ;
  saddle:queryLanguage [ rdfs:label "SPARQL" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-query/> ] ;
  saddle:queryLanguage [ rdfs:label "RDQL" ; saddle:spec <http://www.w3.org/Submission/RDQL/> ] ;
  saddle:resultFormat [
    rdfs:label "SPARQL Query Results XML" ;
    saddle:mediaType "application/sparql-results+xml" ;
    saddle:spec <http://www.w3.org/TR/rdf-sparql-XMLres/
  ] ;
  saddle:resultFormat [
  	rdfs:label "RDF/XML" ;
  	saddle:mediaType "application/rdf+xml" ;
  	saddle:spec <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  ] ;
  saddle:resultFormat [
		rdfs:label "SPARQL Query Results JSON" ;
		saddle:mediaType "application/sparql-results+json" ;
  	saddle:spec <http://www.w3.org/TR/rdf-sparql-json-res/>
  ] ;
  
  sparql:extensionFunction <java:com.hp.hpl.jena.query.function.library.sha1sum> ;
  sparql:extensionFunction <java:com.ldodds.sparql.Distance> ;

  sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/service> ;
  sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/unsaid> ;
  sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/federate_bindings> .

3.4 Related discussions

The serviceDescription issue was previously postponed by the DAWG.

3.5 Status

This feature is considered as Required by the WG.

4 Update

The Working Group has resolved to specify a SPARQL/Update language, but may also pursue a HTTP based graph update via the protocol. This issue is orthogonal to the SPARQL/Update language. Whether or not there will be a concrete mapping between SPARQL/Update and HTTP based graph update is currently under discussion in the working group.

4.1: SPARQL/Update 1.0 Language

4.1.1 Motivations

To change an RDF graph (either adding, updating or removing statements as well as adding statements from one graph to another or to the default graph of a triple store) one would currently have to use a programming language and one of several APIs. In other query languages, notably SQL, there are mechanisms to change the data in the database. To allow RDF graphs to be manipulated the same way and avoid using third-party APIs, a language extension is needed.

4.1.2 Description

This feature is a language extension to express updates to an RDF graph or to an RDF store. As such, it uses the SPARQL in both style and detail, reduces the learning curve for developers and reduces implementation costs.

The following facilities are expected to be provided by the SPARQL/Update 1.0 language:

4.1.3 Existing implementation(s)

The [SPARUL] Member Submission, that contains several examples, have been widely implemented and is considered a starting point for the present work.

The two following examples illustrates some of the features:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT DATA
{ <http://example/book3> dc:title    "A new book" ;
                         dc:creator  "A.N.Other" .
}
  
DELETE { ?book ?p ?v }
WHERE
  {   ?book dc:date ?date . 
       FILTER ( ?date < "2000-01-01T00:00:00"^^xsd:dateTime ) 
       ?book ?p ?v
      } 
  }   	
  

The following systems are known by the WG to support the [SPARUL] Member Submission at the time of publication:

4.1.4 Related discussions

Related issues raised by the WG:

4.1.5 Status

This feature is considered as Required by the WG.

4.2 Protocol Enhancements for Update

4.2.1 Motivations

By making it possible to update an RDF graph using RESTful HTTP methods, it becomes possible to use either a SPARQL endpoint or a plain Web server to update RDF data.

4.2.2 Description

It should be possible to manipulate RDF graphs using HTTP verbs, notably PUT, POST and DELETE. By this, clients doesn't need to know the SPARQL language to update graphs when it is not needed.

4.2.3 Existing implementation(s)

The following systems are known by the WG at the time of publication to support a RESTful update protocol.

4.2.4 Related discussions

4.2.5 Status

This feature is under discussion in the WG.

5 BGP extensions for entailment regimes

5.1 Motivation

Many software systems that support entailment regimes such as OWL dialects and RDF Schema extend the semantics of SPARQL Basic Graph Pattern matching to apply to entailments other than simple entailment. The formal semantics of these SPARQL/Query extensions are not standardized, and query writers cannot currently be guaranteed interoperable behavior when working with multiple query engines that extend SPARQL with the same entailment regime.

5.2 Description

SPARQL/Query 1.0 defines a mechanism to adapt SPARQL to entailment regimes beyond simple entailment by providing necessary conditions on re-defining the meaning of SPARQL Basic Graph Pattern matching. Time-permitting, the SPARQL WG will use the existing framework to define the semantics of SPARQL queries for one or more of these entailment frameworks:

5.5 Status

This feature is considered as time-permitting only by the WG.

6 Acknowledgments

The editors would like to thank the SPARQL Working Group for their valuable input for this document.

7 References

[SPARQL/Query 1.0]
SPARQL Query Language for RDF. Eric Prud'hommeaux, Andy Seaborne. W3C Recommendation 15 January 2008. http://www.w3.org/TR/rdf-sparql-query/
[SPARUL]
SPARQL Update - A language for updating RDF graphs. Andy Seaborne, Geetha Manjunath, Chris Bizer, John Breslin, Souripriya Das, Ian Davis, Steve Harris, Kingsley Idehen, Olivier Corby, Kjetil Kjernsmo, Benjamin Nowack. W3C Member Submission 15 July 2008. http://www.w3.org/Submission/2008/SUBM-SPARQL-Update-20080715/
[SeRQL]
The SeRQL query language (revision 3.0). 2002-2008 Aduna B.V http://www.openrdf.org/doc/sesame2/users/ch09.html
[XPath-Functions]
XQuery 1.0 and XPath 2.0 Functions and Operators, A. Malhotra, J. Melton, N. Walsh (Editors), W3C Recommendation, World Wide Web Consortium, 23 January 2007, http://www.w3.org/TR/2007/REC-xpath-functions-20070123/.