Abstract
Provenance is information about entities, activities, and people
involved in producing a piece of data or thing, which can be used
to form assessments about its quality, reliability or trustworthiness.
PROV-DM is the conceptual data model that forms a basis for the W3C
provenance (PROV) family of specifications.
PROV-DM distinguishes core structures, forming the essence of provenance information, from
extended structures catering for more specific uses of provenance.
PROV-DM is organized in six components, respectively dealing with:
(1) entities and activities, and the time at which they were created, used, or ended;
(2) derivations of entities from entities;
(3) agents bearing responsibility for entities that were generated and activities that happened;
(4) a notion of bundle, a mechanism to support provenance of provenance; and,
(5) properties to link entities that refer to the same thing;
(6) collections forming a logical structure for its members.
To provide examples of the PROV data model, the PROV notation (PROV-N) is introduced: aimed at human consumption, PROV-N allows serializations of PROV
instances to be created in a compact manner. PROV-N facilitates the
mapping of the PROV data model to concrete syntax, and is used as the basis for a
formal semantics of PROV. The purpose of this document is to define the PROV-N notation.
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Last Call
This is a Last Call Working Draft. The design is not expected to change significantly, going forward, and now is the key time for external review.
This specification identifies one feature at risk: the expression Mention (section 3.5.3) might be removed from PROV if implementation experience reveals problems with supporting this construct.
PROV Family of Specifications
This document is part of the PROV family of specifications, a set of specifications defining various aspects that are necessary to achieve the vision of inter-operable
interchange of provenance information in heterogeneous environments such as the Web. The specifications are:
- PROV-DM, the PROV data model for provenance;
- PROV-CONSTRAINTS, a set of constraints applying to the PROV data model;
- PROV-N, a notation for provenance aimed at human consumption (this document);
- PROV-O, the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF;
- PROV-AQ, the mechanisms for accessing and querying provenance;
- PROV-PRIMER, a primer for the PROV data model.
How to read the PROV Family of Specifications
- The primer is the entry point to PROV offering an introduction to the provenance model.
- The Linked Data and Semantic Web community should focus on PROV-O defining PROV classes and properties specified in an OWL2 ontology. For further details, PROV-DM and PROV-CONSTRAINTS specify the constraints applicable to the data model, and its interpretation.
- Developers seeking to retrieve or publish provenance should focus on PROV-AQ.
- Readers seeking to implement other PROV serializations
should focus on PROV-DM and PROV-CONSTRAINTS. PROV-O and PROV-N offer examples of mapping to RDF and text, respectively.
This document was published by the Provenance Working Group as a Last Call Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to [email protected] (subscribe, archives). The Last Call period ends 18 September 2012. All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1. Introduction
Provenance is a record that describes the people,
institutions, entities, and activities, involved in producing,
influencing, or delivering a piece of data or a thing in the world. Two
companion specifications respectively define PROV-DM, a data model for
provenance, allowing provenance descriptions to be expressed [PROV-DM] and a set of constraints that provenance descriptions are expected to satisfy [PROV-CONSTRAINTS].
1.1 Purpose of this Document and Target Audience
A key goal of PROV is the specification of a machine-processable data model for provenance. However, communicating provenance between humans is also important when teaching, illustrating, formalizing, and discussing provenance-related issues.
With these two requirements in mind, this document introduces PROV-N, the PROV notation, a syntax designed to write instances of the PROV data model according to the following design principles:
- Technology independence. PROV-N provides a simple syntax that can be mapped to several technologies.
- Human readability. PROV-N follows a functional syntax style that is meant to be easily human-readable so it can be used in illustrative examples, such as those presented in the PROV documents suite;
- Formality. PROV-N is defined through a formal grammar amenable to be used with parser generators.
PROV-N has several known uses:
- It is the notation used in the examples found in [PROV-DM], as well as in the definition of PROV constraints [PROV-CONSTRAINTS];
- It is a source language for the encoding of PROV data model instances into a variety of target languages, including amongst others RDF [PROV-RDF] and XML [PROV-XML];
- It provides the basis for a formal semantics of the PROV data model [PROV-SEM], in which an interpretation is given to each element of the PROV-N language.
This document introduces the PROV-N grammar along with examples of its usage.
Its target audience is twofold:
- Developers of provenance management applications, as well as implementors of new PROV data model encodings, and thus in particular of PROV-N parsers. These readers may be interested in the entire structure of the grammar, starting from the top level nonterminal
bundle
.
- Readers of the [PROV-DM] and of [PROV-CONSTRAINTS] documents, who are interested in the details of the formal language underpinning the notation used in the examples and in the definition of the constraints. Those readers may find the
expression
nonterminal a useful entry point into the grammar.
1.2 Structure of this Document
This document is structured as follows.
Section 2 provides general consideration about the PROV-N grammar.
Section 3 presents the grammar of all expressions of the language grouped according to the PROV data model components.
Section 4 defines the grammar of toplevel bundles, a house-keeping construct of PROV-N capable of packaging up PROV-N expressions and namespace declarations.
Section 5 defines the extensibility mechanism for the PROV-N notation.
Section 6 defines media type for the PROV-N notation.
1.3 Notational Conventions
The key words "must", "must not", "required", "shall", "shall
not", "should", "should not", "recommended", "may", and
"optional" in this document are to be interpreted as described in
[RFC2119].
The following namespaces prefixes are used throughout this document.
Table 1: Prefix and Namespaces used in this specification
prefix | namespace uri | definition |
prov | http://www.w3.org/ns/prov# | The PROV namespace (see Section 3.7.4) |
xsd | http://www.w3.org/2000/10/XMLSchema# | XML Schema Namespace [XMLSCHEMA11-2] |
(others) | (various) | All other namespace prefixes are used in examples only. In particular, URIs starting with "http://example.com" represent some application-dependent URI [URI] |
2. General grammar considerations
2.1 Functional-style Syntax
PROV-N adopts a functional-style syntax consisting of a predicate name and an ordered list of terms.
All PROV data model types have an identifier. Furthermore, some expressions also admit additional elements that further characterize it.
The following expression should be read as "entity e1".
entity(e1)
The following expression should be read as "activity a2, which occurred
between 2011-11-16T16:00:00
and 2011-11-16T16:00:01
".
entity(e1)
activity(a2, 2011-11-16T16:00:00, 2011-11-16T16:00:01)
All PROV data model relations involve two primary elements, the subject and the object, in this order. Furthermore, some expressions also admit additional elements that further characterize it.
The following expression should be read as "e2 was derived from e1". Here e2 is the subject, and e1 is the object.
wasDerivedFrom(e2, e1)
The following expression expands the above derivation relation by providing
additional elements the optional activity a, the generation g2, and the usage u1:
wasDerivedFrom(e2, e1, a, g2, u1)
2.2 EBNF Grammar
The grammar is specified using a subset of the Extended Backus-Naur
Form (EBNF) notation, as defined in Extensible Markup Language (XML) 1.1
[XML11] section 6
Notation.
The text below provides an introduction to the EBNF notation used in
this document.
EBNF specifies a series of production rules (production).
A production rule in the grammar defines a symbol
expr
(nonterminal symbol)
using the following form:
expr
::= term
Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter.
A production rule in the grammar defines a symbol
<TERMINAL>
(terminal symbol)
using the following form:
<TERMINAL>
::= term
Within the term on the right-hand side of a rule, the following
terms are used to match strings of one or more characters:
-
expr
: matches production for nonterminal symbol expr
-
TERMINAL
: matches production for terminal symbol TERMINAL
-
"abc"
: matches the literal string inside the single quotes.
-
(term)?
: optional, matches term or nothing.
-
(term)+
: matches one or more occurrences of term.
-
(term)*
: matches zero or more occurrences of term.
-
(term | term)
: matches one of the two terms.
Where suitable, the PROV-N grammar reuses production and terminal names of the SPARQL grammar [RDF-SPARQL-QUERY].
2.3 Main Productions
Two productions are entry points to the grammar.
The production expression
provides the structure for the core expressions of PROV-N.
Each of the symbols included in expression
above, i.e., entityExpression
, activityExpression
etc., corresponds to one concept (e.g., Entity, Activity, etc.) of the PROV data model.
Alternatively, the production rule bundle
provides the overall structure of PROV-N descriptions. It is a wrapper for
a set of expressions, such that the text for an element matches the corresponding expression
production, and some namespace declarations.
2.4 Optional terms in expressions
Some terms in an expression may be optional. For example:
wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1)
In a derivation expression, the activity, generation, and usage are optional terms. They are specified in the first derivation, but not in the second.
activity(a2, 2011-11-16T16:00:00, 2011-11-16T16:00:01)
activity(a1)
The start and end times for Activity
a1 are optional. They are specified in the first expression, but not in the second.
The general rule for optionals is that, if none of the optionals are used in the expression, then they are simply omitted, resulting in a simpler expression as in the examples above.
However, it may be the case that only some of the optional terms are omitted. Because the position of the terms in the expression matters, an additional marker must be used to indicate that a particular term is not available. The symbol
'-' is used for this purpose.
In the first expression below, all optionals are specified. However in the second and third, only one optional is specified, forcing the use of the marker for the missing terms.
wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1, -, -, u1)
wasDerivedFrom(e2, e1, a, -, -)
Note that the more succinct form is just shorthand for a complete expression with all the markers specified:
activity(a1)
activity(a1, -, -)
2.5 Identifiers and attributes
Almost all expressions defined in the grammar include an identifier (see Section 3.7.1 for the full syntax of identifiers). Most expressions
can aslo include a set of attribute-value pairs, delimited by square brackets. Identifiers are optional except for Entities, Activites, and Agents. Identifiers are always the first term in any expression. Optional identifiers must be separated using a semi-colon ';', but where the identifiers are required, a regular comma ',' must be used. This makes it possible to completely omit an optional identifier with no ambiguity arising. Also, if the set of attribute-value pairs is present, it is always the last term in any expression.
Derivation has an optional identifier. In the first expression, the identifier is not available, while it is explicit in the second. The third example shows that one can optionally indicate the missing identifier using the - marker. This is equivalent to the first expression.
wasDerivedFrom(e2, e1)
wasDerivedFrom(d; e2, e1)
wasDerivedFrom(-; e2, e1)
Lack of attributes can be equivalently expressed by omitting the list, or by using an empty list.
The first and second activity expressions have no attributes, and are equivalent.
The third activity expression has two attributes.
activity(ex:a1)
activity(ex:a1, [])
activity(ex:a1, [ex:param1="a", ex:param2="b"])
3. PROV-N Productions per Component
This section introduces grammar productions for each expression, followed by small examples of expressions illustrating the grammar. Strings conforming to the grammar are valid expressions in the PROV-N language.
3.1 Component 1: Entities and Activities
3.1.1 Entity
The following table summarizes how each constituent of a PROV-DM Entity maps to a PROV-N syntax element.
entity(tr:WD-prov-dm-20111215, [ prov:type="document" ])
Here
tr:WD-prov-dm-20111215 is the entity identifier, and
[ prov:type="document" ] groups the optional attributes, only one in this example, with their values.
entity(tr:WD-prov-dm-20111215)
Here, the optional attributes are absent.
3.1.2 Activity
The following table summarizes how each constituent of a PROV-DM Activity maps to a PROV-N syntax element.
activity(ex:a10, 2011-11-16T16:00:00, 2011-11-16T16:00:01, [prov:type="createFile"])
Here ex:a10 is the activity identifier, 2011-11-16T16:00:00 and 2011-11-16T16:00:01 are the optional start and end times for the activity, and [prov:type="createFile"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
activity(ex:a10)
activity(ex:a10, -, -)
activity(ex:a10, -, -, [prov:type="edit"])
activity(ex:a10, -, 2011-11-16T16:00:00)
activity(ex:a10, 2011-11-16T16:00:00, -)
activity(ex:a10, 2011-11-16T16:00:00, -, [prov:type="createFile"])
activity(ex:a10, [prov:type="edit"])
3.1.3 Generation
The following table summarizes how each constituent of a PROV-DM Generation maps to a PROV-N syntax element.
wasGeneratedBy(ex:g1; tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00, [ex:fct="save"])
Here ex:g1 is the optional generation identifier, tr:WD-prov-dm-20111215 is the identifier of the entity being generated,
ex:edit1 is the optional identifier of the generating activity, 2011-11-16T16:00:00 is the optional generation time, and [ex:fct="save"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasGeneratedBy(e2, a1, -)
wasGeneratedBy(e2, a1, 2011-11-16T16:00:00)
wasGeneratedBy(e2, a1, -, [ex:fct="save"])
wasGeneratedBy(e2, [ex:fct="save"])
wasGeneratedBy(ex:g1; e)
wasGeneratedBy(ex:g1; e, a, tr:WD-prov-dm-20111215)
Additional semantic rules (Section 3.7.5) apply to generationExpression
.
3.1.4 Usage
The following table summarizes how each constituent of a PROV-DM Usage maps to a PROV-N syntax element.
used(ex:u1; ex:act2, ar3:0111, 2011-11-16T16:00:00, [ex:fct="load"])
Here ex:u1 is the optional usage identifier, ex:act2 is the identifier of the using activity,
ar3:0111 is the identifier of the entity being used,
2011-11-16T16:00:00 is the optional usage time, and [ex:fct="load"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
used(ex:act2)
used(ex:act2, ar3:0111, 2011-11-16T16:00:00)
used(a1,e1, -, [ex:fct="load"])
used(ex:u1; ex:act2, ar3:0111, -)
Additional semantic rules (Section 3.7.5) apply to usageExpression
.
3.1.6 Start
The following table summarizes how each constituent of a PROV-DM Start maps to a PROV-N syntax element.
wasStartedBy(s; ex:act2, ex:trigger, ex:act1, 2011-11-16T16:00:00, [ex:param="a"])
Here s is the optional start identifier, ex:act2 is the identifier of the started activity,
ex:trigger is the optional identifier for the entity that triggered the activity start,
ex:act1 is the optional identifier for the activity that generated the (possibly unspecified) entity ex:trigger,
2011-11-16T16:00:00 is the optional start time, and [ex:param="a"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasStartedBy(ex:act2, -, ex:act1, -)
wasStartedBy(ex:act2, -, ex:act1, 2011-11-16T16:00:00)
wasStartedBy(ex:act2, -, -, 2011-11-16T16:00:00)
wasStartedBy(ex:act2, [ex:param="a"])
wasStartedBy(s; ex:act2, e, ex:act1, 2011-11-16T16:00:00)
Additional semantic rules (Section 3.7.5) apply to startExpression
.
3.1.7 End
The following table summarizes how each constituent of a PROV-DM End maps to a PROV-N syntax element.
wasEndedBy(s; ex:act2, ex:trigger,ex:act3, 2011-11-16T16:00:00, [ex:param="a"])
Here s is the optional start identifier,
ex:act2 is the identifier of the ending activity,
ex:trigger is the identifier of the entity that triggered the activity end,
ex:act3 is the optional identifier for the activity that generated the (possibly unspecified) entity e,
2011-11-16T16:00:00 is the optional usage time, and [ex:param="a"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasEndedBy(ex:act2, ex:trigger, -, -)
wasEndedBy(ex:act2, ex:trigger, -, 2011-11-16T16:00:00)
wasEndedBy(ex:act2, -, -, 2011-11-16T16:00:00)
wasEndedBy(ex:act2, -, -, 2011-11-16T16:00:00, [ex:param="a"])
wasEndedBy(e; ex:act2)
wasEndedBy(e; ex:act2, ex:trigger, -, 2011-11-16T16:00:00)
Additional semantic rules (Section 3.7.5) apply to endExpression
.
3.1.8 Invalidation
The following table summarizes how each constituent of a PROV-DM Invalidation maps to a PROV-N syntax element.
wasInvalidatedBy(ex:i1; tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00, [ex:fct="save"])
Here ex:i1 is the optional invalidation identifier, tr:WD-prov-dm-20111215 is the identifier of the entity being invalidated,
ex:edit1 is the optional identifier of the invalidating activity, 2011-11-16T16:00:00 is the optional invalidation time, and [ex:fct="save"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasInvalidatedBy(tr:WD-prov-dm-20111215, ex:edit1, -)
wasInvalidatedBy(tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00)
wasInvalidatedBy(e2, a1, -, [ex:fct="save"])
wasInvalidatedBy(e2, -, -, [ex:fct="save"])
wasInvalidatedBy(ex:i1; tr:WD-prov-dm-20111215, ex:edit1, -)
wasInvalidatedBy(tr:WD-prov-dm-20111215, ex:edit1, -)
Additional semantic rules (Section 3.7.5) apply to invalidationExpression
.
3.2 Component 2: Derivations
3.2.1 Derivation
The following table summarizes how each constituent of a PROV-DM Derivation maps to a PROV-N syntax element.
wasDerivedFrom(d; e2, e1, a, g2, u1, [ex:comment="a righteous derivation"])
Here
d is the optional derivation identifier,
e2 is the identifier for the entity being derived,
e1 is the identifier of the entity from which e2 is derived,
a is the optional identifier of the activity which used/generated the entities,
g2 is the optional identifier of the generation,
u1 is the optional identifier of the usage,
and [ex:comment="a righteous derivation"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasDerivedFrom(e2, e1)
wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1, -, g2, u1)
wasDerivedFrom(e2, e1, a, -, u1)
wasDerivedFrom(e2, e1, a, g2, -)
wasDerivedFrom(e2, e1, a, -, -)
wasDerivedFrom(e2, e1, -, -, u1)
wasDerivedFrom(e2, e1, -, -, -)
wasDerivedFrom(d; e2, e1, a, g2, u1)
wasDerivedFrom(-; e2, e1, a, g2, u1)
3.2.2 Revision
PROV-N provides no dedicated syntax for Revision. Instead, a Revision must be expressed as a
derivationExpression
with attribute prov:type='prov:Revision'
.
3.2.3 Quotation
PROV-N provides no dedicated syntax for Quotation. Instead, a Quotation must be expressed as a
derivationExpression
with attribute prov:type='prov:Quotation'
.
wasDerivedFrom(quoteId1; ex:blockQuote,ex:blog, ex:act1, ex:g, ex:u,
[ prov:type='prov:Quotation' ])
Here, the derivation is provided with a prov:type attribute and value prov:Quotation.
3.2.4 Primary Source
PROV-N provides no dedicated syntax for PrimarySource. Instead, a PrimarySource must be expressed as a
derivationExpression
with attribute prov:type='prov:Primary-Source'
.
wasDerivedFrom(src1; ex:e1, ex:e2, ex:act, ex:g, ex:u,
[ prov:type='prov:PrimarySource' ])
Here, the derivation is provided with a prov:type attribute and value prov:PrimarySource.
3.3 Component 3: Agents, Responsibility, and Influence
3.3.1 Agent
The following table summarizes how each constituent of a PROV-DM Agent maps to a PROV-N syntax element.
PROV-N provides no dedicated syntax for Person, Organization, SoftwareAgent. Instead, a Person, an Organization, or a SoftwareAgent must be expressed as an
agentExpression
with attribute
prov:type='prov:Person'
,
prov:type='prov:Organization'
, or
prov:type='prov:SoftwareAgent'
, respectively.
agent(ag4, [ prov:type='prov:Person', ex:name="David" ])
Here ag is the agent identifier, and
[ prov:type='prov:Person', ex:name="David" ] are optional attributes.
In the next example, the optional attributes are omitted.
agent(ag4)
3.3.2 Attribution
The following table summarizes how each constituent of a PROV-DM Attribution maps to a PROV-N syntax element.
wasAttributedTo(id; e, ag, [ex:license='cc:attributionURL' ])
Here id is the optional attribution identifier, e is an entity identifier,
ag is the identifier of the agent to whom the entity is abscribed,
and [ex:license='cc:attributionURL' ] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasAttributedTo(e, ag)
wasAttributedTo(e, ag, [ex:license='cc:attributionURL' ])
3.3.3 Association
The following table summarizes how each constituent of a PROV-DM Association maps to a PROV-N syntax element.
PROV-N provides no dedicated syntax for Plan. Instead, a Plan must be expressed as an
entityExpression
with attribute prov:type='prov:Plan'
.
wasAssociatedWith(ex:agas; ex:a1, ex:ag1, ex:e1, [ex:param1="a", ex:param2="b"])
Here ex:agas is the optional attribution identifier,
ex:a1 is an activity identifier,
ex:ag1 is the optional identifier of the agent associated to the activity,
ex:e1 is the optional identifier of the plan used by the agent in the context of the activity,
and [ex:param1="a", ex:param2="b"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
wasAssociatedWith(ex:a1, -, ex:e1)
wasAssociatedWith(ex:a1, ex:ag1)
wasAssociatedWith(ex:a1, ex:ag1, ex:e1)
wasAssociatedWith(ex:a1, ex:ag1, ex:e1, [ex:param1="a", ex:param2="b"])
wasAssociatedWith(a; ex:a1, -, ex:e1)
Additional semantic rules (Section 3.7.5) apply to associationExpression
.
3.3.4 Delegation
The following table summarizes how each constituent of a PROV-DM Delegation maps to a PROV-N syntax element.
actedOnBehalfOf(del1; ag2, ag1, a, [prov:type="contract"])
Here del1 is the optional delegation identifier,
ag2 is the identifier for the delegate agent,
ag1 is the identifier of the responsible agent,
a is the optional identifier of the activity for which the delegation link holds,
and [prov:type="contract"] are optional attributes.
The remaining examples show cases where some of the optionals are omitted.
actedOnBehalfOf(ag1, ag2)
actedOnBehalfOf(ag1, ag2, a)
actedOnBehalfOf(ag1, ag2, -, [prov:type="delegation"])
actedOnBehalfOf(ag2, ag3, a, [prov:type="contract"])
actedOnBehalfOf(r; ag2, ag3, a, [prov:type="contract"])
3.3.5 Influence
The following table summarizes how each constituent of a PROV-DM Influence maps to a PROV-N syntax element.
wasInfluencedBy(id;e2,e1,[ex:param="a"])
Here
id is the optional influence identifier,
e2 is an entity identifier,
e1 is the identifier for an ancestor entity that e2 is influenced by,
and [ex:param="a"] is the optional set of attributes.
The remaining examples show cases where some of the optionals are omitted.
wasInfluencedBy(e2,e1)
wasInfluencedBy(e2,e1,[ex:param="a"])
wasInfluencedBy(id; e2,e1)
3.4 Component 4: Bundles
3.4.1 Bundle Constructor
Named bundles cannot be nested because namedBundle
is not an expression
, and therefore cannot occurs inside another namedBundle
.
Named bundles are self-contained: each identifier occuring in a named bundle, including the bundle identifier itself, must be interpreted with respect to the namespace declarations of that bundle. In other words, for every identifier with a prefix p
within a named bundle, there must be a namespace declaration for p
in this named bundled; for every identifier without prefix, there must be a default namespace declaration in this named bundled.
bundle ex:author-view
prefix ex <http://example.org/>
agent(ex:Paolo, [ prov:type='prov:Person' ])
agent(ex:Simon, [ prov:type='prov:Person' ])
//...
endBundle
Here ex:author-view is the name of the bundle.
3.4.2 Bundle Type
When described, a Bundle must be expressed as an
entityExpression
with attribute prov:type='prov:Bundle'
.
3.5 Component 5: Alternate Entities
3.5.1 Alternate
The following table summarizes how each constituent of a PROV-DM Alternate maps to a PROV-N syntax element.
alternateOf(tr:WD-prov-dm-20111215,ex:alternate-20111215)
Here
tr:WD-prov-dm-20111215 is alternate for
ex:alternate-20111215.
3.5.2 Specialization
The following table summarizes how each constituent of a PROV-DM Specialization maps to a PROV-N syntax element.
specializationOf(tr:WD-prov-dm-20111215,tr:prov-dm)
Here
tr:WD-prov-dm-20111215 is a specialization of
tr:prov-dm.
3.5.3 Mention
Note: This feature is "at risk" and may be removed from this specification based on feedback. Please send feedback to [email protected].
The expression Mention might be removed from PROV if implementation experience reveals problems with supporting this construct.
The following table summarizes how each constituent of a PROV-DM Mention maps to a PROV-N syntax element.
mention(ex:report1_as_in_b1, ex:report1, ex:b1)
Here
ex:report1_as_in_b1 is an entity identifier,
ex:report1 is an entity identifier,
ex:b1 is the identifier for a bundle
3.6 Component 6: Collections
3.6.1 Collection
PROV-N provides no dedicated syntax for Collection and EmptyCollection. Instead, a Collection or an EmptyCollection must be expressed as an
entityExpression
with attribute
prov:type='prov:Collection'
, or
prov:type='prov:EmptyCollection'
, respectively.
3.6.2 Membership
The following table summarizes how each constituent of a PROV-DM Membership maps to a PROV-N syntax element.
hadMember(c, e1) // c contained e1
hadMember(c, e2) // c contained e2
Here
c is the identifier for the collection whose membership is stated, and
e1 and e2 are the entities that are members of collection c
3.7 Further Expressions
This section defines further expressions of PROV-N.
3.7.1 Identifier
Various kinds of identifiers are used in productions.
A qualified name is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name.
The PROV data model stipulates that a qualified name can be mapped to an IRI
by concatenating the IRI associated with the prefix and the local part. This section provides the exact details of this procedure for qualified names defined by PROV-N.
A qualified name's prefix is optional. If a prefix occurs in a
qualified name, the prefix must refer to a namespace declared in a namespace declaration. In the absence of prefix, the qualified name
belongs to the default namespace.
A PROV-N qualified name (production QUALIFIED_NAME
) has a more permissive syntax then XML's QName
[XML-NAMES]
and SPARQL PrefixedName
[RDF-SPARQL-QUERY].
A QUALIFIED_NAME
consists of a prefix and a local part. Prefixes follow the production PN_PREFIX
defined by SPARQL [RDF-SPARQL-QUERY]. Local parts have to be conformant with PN_LOCAL
, which extends the original SPARQL PN_LOCAL
definition
by allowing further characters (see PN_CHARS_OTHERS
):
- an extra set of characters commonly encountered in IRIs;
- %-escaped characters (see
PERCENT
) to be interpreted as per
Section 3.1. Mapping of IRIs to URIs in [RFC3987];
- and \-escaped characters (see
PN_CHARS_ESC
).
Given that
'=' (equal),
''' (single quote),
'(' (left bracket),
')' (right bracket),
',' (comma),
':' (colon),
';' (semi-colon),
'"' (double quote),
'[' (left square bracket),
']' (right square bracket) are used by the PROV notation as delimiters, they are not allowed in local parts.
Instead, among those characters, those that are permitted in SPARQL
IRI_REF
are also allowed in PN_LOCAL
if they are escaped by the '\' (backslash character) as per production PN_CHARS_ESC
. Furthermore, '.' (dot), ':' (colon), '-' (hyphen) can also be \-escaped.
A PROV-N qualified name QUALIFIED_NAME
can be mapped to a valid IRI [IRI] by concatenating the namespace denoted its local name PN_PREFIX
to the local name PN_LOCAL
, whose \-escaped characters have been unescaped by dropping the character '\' (backslash).
Examples of articles on the BBC Web site seen as entities.
bundle
prefix bbc <http://www.bbc.co.uk/>
prefix bbcNews <http://www.bbc.co.uk/news/>
entity(bbc:) // bbc site itself
entity(bbc:news/) // bbc news
entity(bbc:news/world-asia-17507976) // a given news article
entity(bbcNews:) // an alternative way of referring to the bbc news site
endBundle
Examples of entities with declared and default namespace.
bundle
default <http://example.org/2/>
prefix ex <http://example.org/1/>
entity(ex:a) // corresponds to IRI http://example.org/1/a
entity(ex:a/) // corresponds to IRI http://example.org/1/a/
entity(ex:a/b) // corresponds to IRI http://example.org/1/a/b
entity(b) // corresponds to IRI http://example.org/2/b
entity(ex:1234) // corresponds to IRI http://example.org/1/1234
entity(4567) // corresponds to IRI http://example.org/2/4567
entity(c/) // corresponds to IRI http://example.org/2/c/
entity(ex:/) // corresponds to IRI http://example.org/1//
endBundle
Examples of \-escaped characters.
bundle
prefix ex <http://example.org/>
default <http://example.org/default>
entity(ex:foo?a\=1) // corresponds to IRI http://example.org/foo?a=1
entity(ex:\-) // corresponds to IRI http://example.org/-
entity(ex:?fred\=fish%20soup) // corresponds to IRI http://example.org/?fred=fish%20soup
used(-;a1,e1,-) // identifier not specified for usage
used(\-;a1,e1,-) // usage identifier corresponds to http://example.org/default-
endBundle
Note:The productions for the terminals QUALIFIED_NAME
and PN_PREFIX
are conflicting.
Indeed, for a tokenizer operating independently of the parse tree, abc
matches both
QUALIFIED_NAME
and
PN_PREFIX
.
In the context of a namespaceDeclaration
, a tokenizer should give preference to the production PN_PREFIX
.
3.7.2 Attribute
The reserved attributes in the PROV namespace are the following.
Their meaning is explained by [PROV-DM] (see Section 5.7.2: Attribute).
- prov:label
- prov:location
- prov:role
- prov:type
- prov:value
3.7.3 Literal
In production datatype
, the QUALIFIED_NAME
is used to denote a PROV data type [PROV-DM].
The non terminals
STRING_LITERAL
,
INT_LITERAL
, and
QUALIFIED_NAME_LITERAL
are syntactic sugar for quoted strings with datatype
xsd:string
,
xsd:int
, and
prov:QUALIFIED_NAME
respectively.
In particular, a Literal may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no specific interpretation in the context of PROV.
Note:The productions for terminals QUALIFIED_NAME
and
INT_LITERAL
are conflicting.
Indeed, for a tokenizer operating independently of the parse tree, 1234
matches both INT_LITERAL
and QUALIFIED_NAME
(local name without prefix). In the context of
a convenienceNotation
, a tokenizer should give preference to the production
INT_LITERAL
.
The following examples illustrate convenience notations.
The two following expressions are strings; if datatype
is not specified, it is xsd:string
.
"abc" %% xsd:string
"abc"
The two following expressions are integers. For convenience, numbers, expressed as digits optionally preceded by a minus sign, can occur without quotes.
"1234" %% xsd:integer
1234
"-1234" %% xsd:integer
-1234
The two following expressions are qualified names. Values of type qualified name can be conveniently expressed within single quotes.
"ex:value" %% prov:QUALIFIED_NAME
'ex:value'
The following examples respectively are the string "abc", the string (in French) "bonjour", the integer number 1, and the IRI "http://example.org/foo".
"abc"
"bonjour"@fr
"1" %% xsd:integer
"http://example.org/foo" %% xsd:anyURI
3.7.3.1 Reserved Type Values
The reserved type values in the PROV namespace are the following.
Their meaning is defined [PROV-DM] (see Section 5.7.2.4: prov:type).
- prov:Bundle
- prov:Collection
- prov:EmptyCollection
- prov:Organization
- prov:Person
- prov:Plan
- prov:PrimarySource
- prov:Quotation
- prov:Revision
- prov:SoftwareAgent
The agent ag is a person (type: prov:Person), whereas the entity
pl is a plan (type: prov:Plan).
agent(ag, [ prov:type='prov:Person' ])
entity(pl, [ prov:type='prov:Plan' ])
3.7.3.2 Time Values
Time instants are defined according to xsd:dateTime [XMLSCHEMA11-2].
The third argument in the following usage expression is a time instance, namely 4pm on 2011-11-16.
used(ex:act2, ar3:0111, 2011-11-16T16:00:00)
3.7.4 Namespace Declaration
A namespaceDeclaration
consists of a binding between a prefix and a namespace. Every qualified name with this prefix in the scope of this declaration belongs to this namespace.
A defaultNamespaceDeclaration
consists of a namespace. Every qualified name without prefix in the scope of this declaration belongs to this namespace. Scope of a declaration is specified as follows:
A set of namespace declarations namespaceDeclarations
must not re-declare the same prefix.
A namespace declaration namespaceDeclaration
must not declare prefixes prov and xsd (see Table 1 for their IRI).
The following example declares three namespaces, one default, and two with explicit prefixes ex1 and ex2.
bundle
default <http://example.org/0/>
prefix ex1 <http://example.org/1/>
prefix ex2 <http://example.org/2/>
...
endBundle
In the following example, a toplevel bundle declares a default namespace and the occurrence of
e001
directly occurring in the toplevel bundle refers to that namespace.
A nested named bundle also declares a default namespace, but with a different IRI.
In that named bundle, the occurrences of
e001
, including for the bundle name, refer to the latest default namespace.
bundle
default <http://example.org/1/>
entity(e001) // IRI: http://example.org/1/e001
bundle e001 // IRI: http://example.org/2/e001
default <http://example.org/2/>
entity(e001) // IRI: http://example.org/2/e001
endBundle
endBundle
3.7.5 Summary of additional semantic rules
Some of the grammar productions allow for expressions that are syntactically correct, and yet according to [
PROV-DM] they are not acceptable, because additional semantic rules are defined for those expressions.
The following table provides a summary of such expressions along with examples of syntactically correct but unacceptable forms, and the additional semantic rules.
Summary of additional semantic rules for grammar productions
Production |
Examples of syntactically correct expressions |
Additional semantic rule |
| | |
Generation expression |
wasGeneratedBy(e2, -, -) wasGeneratedBy(-; e2, -, -) |
At least one of id, activity, time, and attributes must be present. |
Usage expression |
used(a2, -, -) used(-; a2, -, -) |
At least one of id, entity, time, and attributes must be present |
Start expression |
wasStartedBy(e2, -, -, -) wasStartedBy(-; e2, -, -, -) |
At least one of id, trigger, starter, time, and attributes must be present |
End expression |
wasEndedBy(e2, -, -, -) wasEndedBy(-; e2, -, -, -) |
At least one of id, trigger, ender, time, and attributes must be present |
Invalidation expression |
wasInvalidatedBy(e2, -, -) wasInvalidatedBy(-; e2, -, -) |
At least one of id, activity, time, and attributes must be present |
Association expression |
wasAssociatedWith(a, -, -) wasAssociatedWith(-; a, -, -) |
At least one of id, agent, plan, and attributes must be present |
4. Toplevel Bundle
A toplevel bundle is a house-keeping construct of PROV-N capable of packaging up PROV-N expressions and namespace declarations. A toplevel bundle forms a self-contained package of provenance descriptions for the purpose of exchanging them. A toplevel bundle may be used
to package up PROV-N expressions in response to a request for the provenance of something ([PROV-AQ]).
Given its status of house keeping construct for the purpose of exchanging provenance expressions, a toplevel bundle is not defined as a PROV-N expression (production expression
).
A toplevel bundle's text matches the bundle
production.
A toplevel bundle contains:
Thus, named bundles can occur inside a toplevel bundle.
Named bundles are self-contained: each identifier occuring in a named bundle, including the named bundle's identifier itself, must be interpreted with respect to the namespace declarations of that named bundle. In other words, named bundles should not inherit namespaces declarations from the toplevel bundle.
The following bundle contains expressions related to the provenance of entity
e2.
bundle
default <http://anotherexample.org/>
prefix ex <http://example.org/>
entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice",
ex:content="There was a lot of crime in London last month."])
activity(a1, 2011-11-16T16:05:00, -, [prov:type="edit"])
wasGeneratedBy(e2, a1, -, [ex:fct="save"])
wasAssociatedWith(a1, ag2, -, [prov:role="author"])
agent(ag2, [ prov:type='prov:Person', ex:name="Bob" ])
endBundle
This container could for instance be returned as the result of a query to a provenance store for the provenance of entity e2 [PROV-AQ].
5. Extensibility
The PROV data model is extensible by means of attributes prov:type and prov:role allowing subtyping of expressions. For some applications, novel syntax may also be convenient. Hence, the normative requirements are as follow.
- PROV-N compliant parsers must be able to parse expressions matching the
extensibilityExpression
production defined below.
- As PROV provides no definition for these expressions, PROV compliant implementations
may ignore these expressions.
- Extensions to PROV and PROV-N may specify more specific productions and interpretations for these expressions, which applications may adopt to follow.
Expressions compatible with the
extensibilityExpression
production follow a general form of functional syntax, in which the predicate must be a
qualifiedName
with a non-empty prefix
.
Collections are sets of entities, whose membership can be expressed using the hadMember relation. The following example shows how one can express membership for an extension of Collections, namely sets of key-value pairs. The notation is a variation of that used for Collections membership, allowing multiple member elements to be declared, and in which the elements are pairs. The name of the relation is qualified with the extension-specific namespace http://example.org/dictionaries.
prefix dictExt <http://example.org/dictionaries>
dictExt:hadMembers(mId; d, {("k1",e1), ("k2",e2), ("k3",e3)}, [])
Note that the generic
extensibilityExpression
production above allows for alternative notations to be used for expressing membership, if the designers of the extensions so desire. Here is an alternate syntax that is consistent with the productions:
prefix dictExt <http://example.org/dictionaries>
dictExt:hadMembers(mid; d, dictExt:set(dictExt:pair("k1",e1),
dictExt:pair("k2",e2),
dictExt:pair("k3",e3)),
[dictExt:uniqueKeys="true"])
A. Acknowledgements
This document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole.
Members of the PROV Working Group at the time of publication of this document were:
Ilkay Altintas (Invited expert),
Reza B'Far (Oracle Corporation),
Khalid Belhajjame (University of Manchester),
James Cheney (University of Edinburgh, School of Informatics),
Sam Coppens (IBBT),
David Corsar (University of Aberdeen, Computing Science),
Stephen Cresswell (The National Archives),
Tom De Nies (IBBT),
Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland),
Simon Dobson (Invited expert),
Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)),
Kai Eckert (Invited expert),
Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER),
James Frew (Invited expert),
Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)),
Daniel Garijo (Universidad Politécnica de Madrid),
Yolanda Gil (Invited expert),
Ryan Golden (Oracle Corporation),
Paul Groth (Vrije Universiteit),
Olaf Hartig (Invited expert),
David Hau (National Cancer Institute, NCI),
Sandro Hawke (W3C/MIT),
Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh),
Ivan Herman, (W3C/ERCIM),
Ralph Hodgson (TopQuadrant),
Hook Hua (Invited expert),
Trung Dong Huynh (University of Southampton),
Graham Klyne (University of Oxford),
Michael Lang (Revelytix, Inc.),
Timothy Lebo (Rensselaer Polytechnic Institute),
James McCusker (Rensselaer Polytechnic Institute),
Deborah McGuinness (Rensselaer Polytechnic Institute),
Simon Miles (Invited expert),
Paolo Missier (School of Computing Science, Newcastle university),
Luc Moreau (University of Southampton),
James Myers (Rensselaer Polytechnic Institute),
Vinh Nguyen (Wright State University),
Edoardo Pignotti (University of Aberdeen, Computing Science),
Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute),
Carl Reed (Open Geospatial Consortium),
Adam Retter (Invited Expert),
Christine Runnegar (Invited expert),
Satya Sahoo (Invited expert),
David Schaengold (Revelytix, Inc.),
Daniel Schutzer (FSTC, Financial Services Technology Consortium),
Yogesh Simmhan (Invited expert),
Stian Soiland-Reyes (University of Manchester),
Eric Stephan (Pacific Northwest National Laboratory),
Linda Stewart (The National Archives),
Ed Summers (Library of Congress),
Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)),
Ted Thibodeau (OpenLink Software Inc.),
Curt Tilmes (National Aeronautics and Space Administration),
Craig Trim (IBM Corporation),
Stephan Zednik (Rensselaer Polytechnic Institute),
Jun Zhao (University of Oxford),
Yuting Zhao (University of Aberdeen, Computing Science).
2.6 Comments
Comments in PROV-N take two forms:IRI_REF
orSTRING_LITERAL
; such comments continue to the end of line (marked by characters U+000D or U+000A) or end of file if there is no end of line after the comment marker.IRI_REF
orSTRING_LITERAL
.Comments are treated as white space.