Copyright © 2009 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This is a First Public Working Draft of a feature requirement documents for the continued SPARQL language development. This document is expected to change in response to public input and working group decisions.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This Working Draft has undergone several changes since the version of 02 December 2008
The SPARQL Working Group seeks public feedback on this Working Draft. Please send your comments to [email protected] (public archive). If possible, please offer specific changes to the text that would address your concern.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Table of Contents |
This document provides an overview of the main new features of SPARQL and their rationale. This is an update to SPARQL adding several new features that have been agreed by the SPARQL WG. These language features were determined based on real applications and user and tool-developer experience.
The following features have been agreed by the SPARQL WG. These features have been grouped into Required and Time-permitting features as follows.
In the remainder of this document we will present the new features according to the nomenclature agreed by the Working Group:
Each feature is described in a common pattern as follows:
This current working draft details only one required features, but motivation and a description is also provided for the time-permitting features.
Aggregate functions allow operations such as counting, numerical min/max/average and so on, by operating over columns of results. They are currently not taken into account in SPARQL and then require additional scripting to parse query results and get these informations, e.g. the number of triples that satisfy a particular statement. Hence, a language extension is needed.
In SPARQL/Query 1.0 (original SPARQL), query patterns yield a solution set (effectively a table of solutions) from which certain columns are projected and returned as the result of the query. Aggregates provides the ability to partition a solution set into one or more groups based on rows that share specified values, and then to create a new solution set which contains one row per aggregated group. Each solution in this new aggregate solution set may contain either variables whose values are constant throughout the group or aggregate functions that can be applied to the rows in a group to yield a single value. Common aggregate functions include COUNT, SUM, MIN, and MAX.
Aggregate functions are commonly required to perform a slew of application and data-analysis tasks, such as:
Applications can typically take a SPARQL/Query 1.0 solution set and calculate aggregate values themselves. Enabling SPARQL engines to calculate aggregates, however, results in moving work from the application to the SPARQL engine, and will usually result in significantly smaller solution sets being returned to the application.
The following systems are known by the WG at the time of publication to support one or more aggregate functions:
Several aggregate functions are widely implemented, and implementations tend to project out results like for example:
SELECT COUNT(?person) AS ?alices WHERE { ?person :name "Alice" . }
return the number of times the a triple of the form _ :name "Alice" appears in the source data.
SELECT AVG(?value) AS ?average WHERE { ?good a :Widget ; :value ?value . }
Related issues raised by the WG:
This feature is considered as Required by the WG.
It is sometimes necessary to nest the results of a query within another query. It currently requires to get the results of a first query, parse them with dedicated scripts, and then launch the second query. The Subquery feature would allow to do such nesting in a single SPARQL query.
In SPARQL/Query 1.0 (original SPARQL), to nest the result of a
first query into another one, one has to rely on dedicated
script(s) and run separate queries. For instance, to identify
all the people that Alice knows and a single name for each of
them, the following script should be done (in PHP, assuming the
do_query
function allows to run a SPARQL query and
get the results as an array of PHP objects)
$query = " SELECT ?person WHERE { :Alice :knows ?person . }"; $res = do_query($query); foreach ($res as $r) { $person = $r->person->value; $query = "SELECT ?name WHERE { ?person foaf:name ?name . } LIMIT 1"; }
The Subquery feature will provide a way to nest the results of a query within another query. That feature could be used, for instance, in the following use cases:
The query form of subqueries has not yet been decided by the WG (see issues below).
The following implementations are known by the WG at the time of publication to provide a way to run subqueries:
For instance, the following query is possible in ARQ, and is equivalent to the script mentioned before.
SELECT ?person ?name WHERE { :Alice foaf:knows ?person . { SELECT ?name WHERE { ?person foaf:name ?name } LIMIT 1 } }
Related issues raised by the WG:
This feature is considered as Required by the WG.
In SPARQL/Query 1.0 (original SPARQL), Negation by failure
is possible by combining OPTIONAL
,
FILTER
and !BOUND
. It is yet difficult
to write and can be a burden for learning and using SPARQL.
Hence, dedicated language constructs for expressing negation are
desired, as users requested, which the WG agrees with.
Various tasks, such as data validation and social network
analysis, can require the checking of whether certain triples do
or don't exist in the graph. Checking the absence of triples is a
form of negation, called Negation by failure (since it
checks if a pattern does not match, and not if it does not exist
following the open-world assumption) and is already possible in
SPARQL/Query 1.0 (original SPARQL), using FILTER
,
OPTIONAL
and !BOUND()
, as follows (to
retrieve the ?name
of ?x
for which no
foaf:knows
value exists, i.e. identify the name of
people who do not know anyone).
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?x foaf:givenName ?name . OPTIONAL { ?x foaf:knows ?who } . FILTER (!BOUND(?who)) }
Yet, this is not very intuitive to write and learn for users nor does it cater for efficient implementations of negation. Hence, the Negation feature to be included will provide the support for testing the absence of a match to a query pattern. Negation can be used in the following use cases:
The feature would introduce a new operator into the algebra or a new function for filters. Any existing queries do not use these operators and are therefore unaffected.
The following implementations are known by the WG at the time of publication to support Negation by failure:
UNSAID
keyword,
that was proposed during the first SPARQL WG but
not addressed at that timeMINUS
operatorNOT EXISTS
operator (UNSAID
) being an alias for itNOT EXISTS
operator to identify table without a given record
The following example uses SeRQL's MINUS
syntax to find the names
of all people that do not know anyone, in a similar way to the
previous query
SELECT x FROM {x} foaf:givenName {name} MINUS SELECT x FROM {x} foaf:givenName {name} ; foaf:knows {who} USING NAMESPACE foaf = <http://xmlns.com/foaf/0.1/>
The following example uses the UNSAID syntax:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?x WHERE { ?x foaf:givenName ?name UNSAID { ?x foaf:knows ?who } }
Related issues raised by the WG:
This feature is considered as Required by the WG.
Being able to return the values of expressions over result bindings, rather than just RDF terms in the store.
In SPARQL/Query 1.0 (original SPARQL), projection queries (SELECT queries) may only project out variables bound in the query. Because variables can only be bound via triple pattern matching, there is no way to project out values that are not matched in the underlying RDF data set. Projecting expressions represents the ability for SPARQL SELECT queries to project any SPARQL expression, rather than only variables. A projected expression might be a variable, a constant URI, a constant literal, or an arbitrary expression (including function calls) on variables and constants. Functions could include both SPARQL built-in functions and extension functions supported by an implementation.
There are many use cases that motivate the ability to project expressions rather than just variables in SPARQL queries. In general, the motivation is to return values that do not occur in the graphs that comprise a query's RDF data set. Specific examples include:
TODO: More mention should be made of the connection with subqueries, as the two can be used together to answer many usecases.
The following systems are known by the WG at the time of publication to support some uses of project expressions:
We wish to find names, and whether the person is over 18.
SELECT ?name (?age > 18) AS over18 WHERE { ?person :name ?name ; :age ?age . }
Another example, we wish to find the full name of everyone who is interested in trees.
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX : <http://www.example.org/> SELECT fn:string-join(?givenName, ' ', ?surname) AS ?fullName WHERE { ?person foaf:givenname ?givenName ; foaf:surname ?surname ; foaf:interest :trees . }
This example has made use of a concatenation function from XPath-Functions. Which functions will be available for value construction in SPARQL is an open issue that will be dealt with on a time-permitting basis.
To return an RDF graph where the first and family names are concatenated to a full name we can use a query similar to the SELECT example from the previous query as a subquery and use a project expression like this:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> CONSTRUCT { ?x foaf:name ?fullName } WHERE { { SELECT fn:string-join(?gn, " ", ?sn) AS ?fullName WHERE { foaf:givenname ?gn ; foaf:surname ?sn . } } }
TODO: It should be established whether any implementations support both subqueries and fn:string-join()
The WG has noted that project expressions
This feature is considered as Required by the WG.
Certain limitations of the SPARQL/Query 1.0 language syntax cause unnecessary barriers for learning and using SPARQL.
Time-permitting, the SPARQL Working Group will consider extending SPARQL/Query's syntax to include:
This feature is considered as time-permitting only by the WG.
Many classes of query over RDF graphs require searching data structures that are hierarchical and involve arbitrary-length paths through the graphs. Examples include:
SPARQL/Query 1.0 can express queries over fixed-length paths within RDF graphs. SPARQL/Query 1.0 can also express queries over arbitrary but bounded-length paths via repeated UNION constructs. SPARQL/Query 1.0 cannot express queries that require traversing hierarchical structures via unbounded, arbitrary-length paths.
Time-permitting, the SPARQL Working Group will define the syntax and semantics of property paths, a mechanism for expressing arbitrary-length paths of predicates within SPARQL triple patterns.
This feature is considered as time-permitting only by the WG.
Many SPARQL implementations support functions beyond those required by the SPARQL/Query 1.0 specification. There is little to no interoperability between the names and semantics of these functions for common tasks such as string manipulation.
Time-permitting, the SPARQL WG will define URIs and semantics for a set of functions commonly supported by existing SPARQL implementations.
See Working Group issue: ISSUE-2 - http://www.w3.org/2009/sparql/tracker/issues/2
This feature is considered as time-permitting only by the WG.
SPARQL is a concise query language to retrieve and join information from multiple RDF graphs via a single query. In many cases, the different RDF graphs are stored behind distinct SPARQL endpoints.
Federated query is the ability to take a query and provide solutions based on information from many different sources. It is a hard problem in its most general form and is the subject of continuing (and continuous) research. A building block is the ability to have one query be able to issue a query on another SPARQL endpoint during query execution.
Time-permitting, the SPARQL Working Group will define the syntax and semantics for handling a basic class of federated queries in which the SPARQL endpoints to use in executing portions of the query are explicitly given by the query author.
This feature is considered as time-permitting only by the WG.
Given the variety of SPARQL implementations, and differences in datasets and extension functions, a method of discovering a SPARQL endpoint's capabilities and summary information of its data in a machine-readable way is needed.
Many SPARQL implementations support a variety of SPARQL extensions (many proposed here for standardization), extension functions (for use in FILTERs), and different entailment regimes. Moreover, the differences in datasets provided by SPARQL endpoints is often hard to grasp without some existing knowledge of the underlying data. This proposal suggests that these differences may be described by the endpoints themselves, detailing both (1) the capabilities of the endpoint and (2) the data contained in the endpoint.
The Service description features can be used in the following uses-cases:
The following services are known by the WG at the time of publication to support the Service description feature
"X-Endpoint-Description:"
and the URI given is relative to the endpoint: /description
.
The following service description is an example of what is provided when querying
powered by RDF::Query using about=1
HTTP parameters, e.g.
http://example.org/sparql?about=1
.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix sd: <http://darq.sf.net/dose/0.1#> . @prefix saddle: <http://www.w3.org/2005/03/saddle/#> . @prefix sparql: <http://kasei.example/2008/04/sparql#> . @prefix void: <http://rdfs.org/ns/void#> . [] a sd:Service ; rdfs:label "SPARQL Endpoint for example.org" ; sd:url <http://example.org/sparql> ; sd:totalTriples 12729 ; saddle:queryLanguage [ rdfs:label "SPARQL" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-query/> ] ; saddle:queryLanguage [ rdfs:label "RDQL" ; saddle:spec <http://www.w3.org/Submission/RDQL/> ] ; saddle:resultFormat [ rdfs:label "SPARQL Query Results XML" ; saddle:mediaType "application/sparql-results+xml" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-XMLres/ ] ; saddle:resultFormat [ rdfs:label "RDF/XML" ; saddle:mediaType "application/rdf+xml" ; saddle:spec <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ] ; saddle:resultFormat [ rdfs:label "SPARQL Query Results JSON" ; saddle:mediaType "application/sparql-results+json" ; saddle:spec <http://www.w3.org/TR/rdf-sparql-json-res/> ] ; sparql:extensionFunction <java:com.hp.hpl.jena.query.function.library.sha1sum> ; sparql:extensionFunction <java:com.ldodds.sparql.Distance> ; sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/service> ; sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/unsaid> ; sparql:sparqlExtension <http://kasei.example/2008/04/sparql-extension/federate_bindings> .
The serviceDescription issue was previously postponed by the DAWG.
This feature is considered as Required by the WG.
The Working Group has resolved to specify a SPARQL/Update language, but may also pursue a HTTP based graph update via the protocol. This issue is orthogonal to the SPARQL/Update language. Whether or not there will be a concrete mapping between SPARQL/Update and HTTP based graph update is currently under discussion in the working group.
To change an RDF graph (either adding, updating or removing statements as well as adding statements from one graph to another or to the default graph of a triple store) one would currently have to use a programming language and one of several APIs. In other query languages, notably SQL, there are mechanisms to change the data in the database. To allow RDF graphs to be manipulated the same way and avoid using third-party APIs, a language extension is needed.
This feature is a language extension to express updates to an RDF graph or to an RDF store. As such, it uses the SPARQL in both style and detail, reduces the learning curve for developers and reduces implementation costs.
The following facilities are expected to be provided by the SPARQL/Update 1.0 language:
The [SPARUL] Member Submission, that contains several examples, have been widely implemented and is considered a starting point for the present work.
The two following examples illustrates some of the features:
PREFIX dc: <http://purl.org/dc/elements/1.1/> INSERT DATA { <http://example/book3> dc:title "A new book" ; dc:creator "A.N.Other" . }
DELETE { ?book ?p ?v } WHERE { ?book dc:date ?date . FILTER ( ?date < "2000-01-01T00:00:00"^^xsd:dateTime ) ?book ?p ?v } }
The following systems are known by the WG to support the [SPARUL] Member Submission at the time of publication:
Related issues raised by the WG:
This feature is considered as Required by the WG.
By making it possible to update an RDF graph using RESTful HTTP methods, it becomes possible to use either a SPARQL endpoint or a plain Web server to update RDF data.
It should be possible to manipulate RDF graphs using HTTP verbs, notably PUT, POST and DELETE. By this, clients doesn't need to know the SPARQL language to update graphs when it is not needed.
The following systems are known by the WG at the time of publication to support a RESTful update protocol.
This feature is under discussion in the WG.
Many software systems that support entailment regimes such as OWL dialects and RDF Schema extend the semantics of SPARQL Basic Graph Pattern matching to apply to entailments other than simple entailment. The formal semantics of these SPARQL/Query extensions are not standardized, and query writers cannot currently be guaranteed interoperable behavior when working with multiple query engines that extend SPARQL with the same entailment regime.
SPARQL/Query 1.0 defines a mechanism to adapt SPARQL to entailment regimes beyond simple entailment by providing necessary conditions on re-defining the meaning of SPARQL Basic Graph Pattern matching. Time-permitting, the SPARQL WG will use the existing framework to define the semantics of SPARQL queries for one or more of these entailment frameworks:
This feature is considered as time-permitting only by the WG.
The editors would like to thank the SPARQL Working Group for their valuable input for this document.