PROV-N: The Provenance Notation

Abstract

PROV-DM, the PROV data model, is a data model for provenance that describes the entities, people and activities involved in producing a piece of data or thing. PROV-DM is structured in six components, dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to the same thing; (5) collections forming a logical structure for its members; (6) a simple annotation mechanism.

To provide examples of the PROV data model, the PROV notation (PROV-N) is introduced: aimed at human consumption, PROV-N allows serializations of PROV instances to be created in a compact manner. PROV-N facilitates the mapping of the PROV data model to concrete syntax, and is used as the basis for a formal semantics of PROV. The purpose of this document is to define the PROV-N notation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

PROV Family of Specifications

This document is part of the PROV family of specifications, a set of specifications defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. The specifications are:

PROV-DM, the PROV data model for provenance;
PROV-CONSTRAINTS, a set of constraints applying to the PROV data model;
PROV-N, a notation for provenance aimed at human consumption (this document);
PROV-O, the PROV ontology, an OWL-RL ontology allowing the mapping of PROV to RDF;
PROV-AQ, the mechanisms for accessing and querying provenance;
PROV-PRIMER, a primer for the PROV data model;
PROV-SEM, a formal semantics for the PROV data model;
PROV-XML, an XML schema for the PROV data model.

How to read the PROV Family of Specifications

The primer is the entry point to PROV offering an introduction to the provenance model.
The Linked Data and Semantic Web community should focus on PROV-O defining PROV classes and properties specified in an OWL-RL ontology. For further details, PROV-DM and PROV-CONSTRAINTS specify the constraints applicable to the data model, and its interpretation. PROV-SEM provides a mathematical semantics.
The XML community should focus on PROV-XML defining an XML schema for PROV. Further details can also be found in PROV-DM, PROV-CONSTRAINTS, and PROV-SEM.
Developers seeking to retrieve or publish provenance should focus on PROV-AQ.
Readers seeking to implement other PROV serializations should focus on PROV-DM and PROV-CONSTRAINTS. PROV-O, PROV-N, PROV-XML offer examples of mapping to RDF, text, and XML, respectively.

First Public Working Draft

This is the first public release of the PROV-N document. Following feedback, the Working Group has decided to reorganize the PROV-DM document substantially, separating the data model, from its constraints, and the notation used to illustrate it. The PROV-N release is synchronized with the release of the PROV-DM, PROV-O, PROV-PRIMER, and PROV-CONSTRAINTS documents.

This document was published by the Provenance Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to [email protected] (subscribe, archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

1. Introduction

Provenance is defined as a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing in the world. Two companion specifications respectively define PROV-DM, a data model for provenance, allowing provenance descriptions to be expressed [PROV-DM] and a set of constraints that provenance descriptions are expected to satisfy [PROV-CONSTRAINTS].

1.1 Purpose of this Document and target audience

A key goal of PROV is the specification of a machine-processable data model for provenance. However, communicating provenance between humans is also important when teaching, illustrating, formalizing, and discussing provenance-related issues. With these two requirements in mind, this document introduces PROV-N, a syntax notation designed to write instances of the PROV data model according to the following design principles:

Technology independence. PROV-N provides a simple syntax that can be mapped to several technologies.
Human readability. PROV-N follows a functional syntax style that is meant to be easily human-readable so it can be used in illustrative examples, such as those presented in the PROV documents suite;
Formality. PROV-N is defined through a formal grammar amenable to be used with parser generators.

PROV-N has several known uses:

It is the notation used in the examples found in [PROV-DM], as well as in the definition of PROV constraints [PROV-CONSTRAINTS];
It is a source language for the encoding of PROV data model instances into a variety of target languages, including amongst others RDF [PROV-RDF] and XML [PROV-XML];
It provides the basis for a formal semantics of the PROV data model [PROV-SEM], in which an interpretation is given to each element of the PROV-N language.

This document introduces the PROV-N grammar along with examples of its usage.

Its target audience is twofold:

Developers of provenance management applications, as well as implementors of new PROV data model encodings, and thus in particular of PROV-N parsers. These readers may be interested in the entire structure of the grammar, starting from the top level nonterminal container.
Readers of the [PROV-DM] and of [PROV-CONSTRAINTS] documents, who are interested in the details of the formal language underpinning the notation used in the examples and in the definition of the constraints. Those readers may find the expression nonterminal a useful entry point into the grammar.

1.2 Structure of this Document

This document defines a grammar using the Extended Backus-Naur Form (EBNF) notation. Its productions correspond to PROV data model types and relations.

It is structured as follows.

Section 2 provides the design rationale for the PROV Notation.

Section 3 defines the notation for the Extended Backus-Naur Form (EBNF) grammar used in this specification.

Section 4 presents the grammar of all expressions of the language grouped according to the PROV data model components.

Section 5 defines the grammar of containers, a house-keeping construct of PROV-N capable of packaging up PROV-N expressions and namespace declarations.

Section 6 defines the grammar of accounts.

Section 7 defines media type for the PROV-N notation.

1.3 Notational Conventions

The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in [RFC2119].

The following namespaces prefixes are used throughout this document.

Table 1: Prefix and Namespaces used in this specification
prefix	namespace uri	definition
prov	http://www.w3.org/ns/prov#	The PROV namespace (see Section 4.7.1)
xsd	http://www.w3.org/2000/10/XMLSchema#	XML Schema Namespace [XMLSCHEMA-2]
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#	The RDF namespace [RDF-CONCEPTS]
(others)	(various)	All other namespace prefixes are used in examples only. In particular, URIs starting with "http://example.com" represent some application-dependent URI [URI]

2. General grammar considerations

2.1 Functional-style Syntax

PROV-N adopts a functional-style syntax consisting of a predicate name and an ordered list of terms. All PROV data model relations involve two primary elements, the subject and the object, in this order. Furthermore, some expressions also admit additional elements that further characterize it.

The following expression should be read as "e2 was derived from e1". Here e2 is the subject, and e1 is the object.

wasDerivedFrom(e2, e1)

In the following expressions, the optional activity a along with the generation and usage identifiers, and timestamps have been added to further qualify the derivation:

wasDerivedFrom(e2, e1, a, g2, u1)
activity(a2, 2011-11-16T16:00:00, 2011-11-16T16:00:01)

2.2 EBNF Grammar

The grammar is specified using the Extended Backus-Naur Form (EBNF) notation.

Each production rule (or production, for short) in the grammar defines one non-terminal symbol E, in the following form:

E ::= term

Within the term on the right-hand side of a rule, the following terms are used to match strings of one or more characters:

E: matches term satisfying rule for symbol E.
abc: matches the literal string inside the single quotes.
term: optional term, matches term or nothing.
term: matches one or more occurrences of term.
term: matches zero or more occurrences of term.
term | term: matches one of the two terms.

The grammar is centered on nonterminals for various types of expression. The main production is introduced here below as it reflects the rationale for the design of the entire grammar. Note that parser developers may use the top level container nonterminal as a starting point instead.

expression

::=

Each expression non-terminal expression, i.e., entityExpression, activityExpression etc., corresponds to one element (entity, activity, etc.) of the PROV data model.

A PROV-N document consists of a collection of expressions, wrapped in an expression container with some namespace declarations, such that the text for an element matches the corresponding expression production of the grammar.

2.3 Optional terms in expressions

Some terms in an expression may be optional. For example:

wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1)

In a derivation expression, the activity, generation, and usage are optional terms. They are specified in the first derivation, but not in the second.

activity(a2, 2011-11-16T16:00:00, 2011-11-16T16:00:01)
activity(a1)

The start and end times for Activity a1 are optional. They are specified in the first expression, but not in the second.

The general rule for optionals is that, if none of the optionals are used in the expression, then they are simply omitted, resulting in a simpler expression as in the examples above.

However, it may be the case that only some of the optional terms are omitted. Because the position of the terms in the expression matters, an additional marker must be used to indicate that a particular term is not available. The symbol - is used for this purpose.

In the first expression below, all optionals are specified. However in the second, only the last one is specified, forcing the use of the marker for the missing terms. In the last, no marker is necessary because all remaining optionals after a are missing.

wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1, -, -, u1)
wasDerivedFrom(e2, e1, a)

Note that the more succinct form is just shorthand for a complete expression with all the markers specified:

activity(a1)
activity(a1, -, -)

2.4 Identifiers and attributes

Most expressions defined in the grammar include the use of two terms: an identifier for the predicate, and a set of attribute-value pairs, delimited by square brackets. Both are optional (unless specified otherwise). By convention, the identifier is the first term in any expression, and the set of attribute-value pairs is the last.

Consistent with the convention on optional terms, the '-' marker can be used when the identifier is not available. Additionally, the grammar rules are defined in such a way that the optional identifier can be omitted altogether with no ambiguity arising.

Derivation has an optional identifier. In the first expression, the identifier is not available. It is explicit in the second, and marked by a - in the third.

wasDerivedFrom(e2, e1)
wasDerivedFrom(d, e2, e1)
wasDerivedFrom(-, e2, e1)

A distinction is made between expressions with no attributes, and expressions that include an empty list of attributes.

The first activity does not have any attributes. The second has an empty list of attributes. The third activity has two attributes.

activity(ex:a10)
activity(ex:a10, [])
activity(ex:a10, [ex:param1="a", ex:param2="b"])

3. PROV-N Productions per Component

This section introduces grammar productions for each expression, followed by small examples illustrating the use of expressions in PROV-N. Strings conforming to the grammar are valid expressions in the PROV-N language.

3.1 Component 1: Entities and Activities

3.1.1 Entity

entityExpression ::= entity ( identifier optional-attribute-values )

optional-attribute-values ::= , [ attribute-values ]
attribute-values ::= attribute-value | attribute-value , attribute-values
attribute-value ::= attribute = Literal

entity(tr:WD-prov-dm-20111215, [ prov:type="document" ])

Here tr:WD-prov-dm-20111215 is the optional entity identifier, and [ prov:type="document" ] groups the optional attributes with their values.


entity(tr:WD-prov-dm-20111215)

Here the optional attributes are not used.

3.1.2 Activity

activityExpression ::= activity ( identifier , (time | - ) , (time | - ) optional-attribute-values )

activity(ex:a10, 2011-11-16T16:00:00, 2011-11-16T16:00:01, [prov:type="createFile"])

Here ex:a10 is the optional activity identifier, 2011-11-16T16:00:00 and 2011-11-16T16:00:01 are the optional start and end times for the activity, and [prov:type="createFile"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

activity(ex:a10)
activity(ex:a10, -, -)
activity(ex:a10, -, -, [prov:type="edit"])
activity(ex:a10, -, 2011-11-16T16:00:00)
activity(ex:a10, 2011-11-16T16:00:00, -)
activity(ex:a10, 2011-11-16T16:00:00, -, [prov:type="createFile"])
activity(ex:a10, [prov:type="edit"])

3.1.3 Generation

generationExpression ::= wasGeneratedBy ( ( identifier | - ) , eIdentifier , ( aIdentifier | - ) , ( time | - ) optional-attribute-values )

wasGeneratedBy(ex:g1, tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00,  [ex:fct="save"])

Here ex:g1 is the optional generation identifier, tr:WD-prov-dm-20111215 is the identifier of the entity being generated, ex:edit1 is the optional identifier of the generating activity, 2011-11-16T16:00:00 is the optional generation time, and [ex:fct="save"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1, -)
wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00)
wasGeneratedBy(e2, a1, -, [ex:fct="save"])     
wasGeneratedBy(e2, -, -, [ex:fct="save"])     
wasGeneratedBy(ex:g1, tr:WD-prov-dm-20111215, ex:edit1, -)
wasGeneratedBy(-, tr:WD-prov-dm-20111215, ex:edit1, -)

Even though the production generationExpression allows for expressions wasGeneratedBy(e2, -, -) and wasGeneratedBy(-, e2, -, -), these expressions are not valid in PROV-N, since at least one of activity, time, or attributes must be present.

3.1.4 Usage

usageExpression ::= used ( ( identifier | - ) , aIdentifier , eIdentifier , ( time | - ) optional-attribute-values )

used(ex:u1, ex:act2, ar3:0111, 2011-11-16T16:00:00, [ex:fct="load"])

Here ex:u1 is the optional usage identifier, ex:act2 is the identifier of the using activity, ar3:0111 is the identifier of the entity being used, 2011-11-16T16:00:00 is the optional usage time, and [ex:fct="load"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

used(ex:act2, ar3:0111, -)
used(ex:act2, ar3:0111, 2011-11-16T16:00:00)
used(a1,e1, -, [ex:fct="load"])
used(ex:u1, ex:act2, ar3:0111, -)
used(-, ex:act2, ar3:0111, -)

3.1.5 Start

startExpression ::= wasStartedBy ( ( identifier | - ) , aIdentifier , ( eIdentifier | - ) , ( time | - ) optional-attribute-values )

wasStartedBy(s, ex:act2, ar3:0111, 2011-11-16T16:00:00, [ex:param="a"])

Here s is the optional start identifier, ex:act2 is the identifier of the starting activity, ar3:0111 is the identifier of the entity that triggered the activity start, 2011-11-16T16:00:00 is the optional usage time, and [ex:param="a"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasStartedBy(ex:act2, ar3:0111, -)
wasStartedBy(ex:act2, ar3:0111, 2011-11-16T16:00:00)
wasStartedBy(ex:act2, -, 2011-11-16T16:00:00)
wasStartedBy(ex:act2, -, -)
wasStartedBy(ex:act2, -, -, [ex:param="a"])
wasStartedBy(s, ex:act2, ar3:0111, 2011-11-16T16:00:00)
wasStartedBy(-, ex:act2, ar3:0111, 2011-11-16T16:00:00)

Note: Even though the production startExpression allows for expressions wasStartedBy(e2, -, -) and wasStartedBy(-, e2, -, -), these expressions are not valid in PROV-N, since at least one of trigger, time, or attributes must be present.

3.1.6 End

endExpression ::= wasEndedBy ( ( identifier | - ) , aIdentifier , ( eIdentifier | - ) , ( time | - ) optional-attribute-values )

wasEndedBy(s, ex:act2, ex:trigger, 2011-11-16T16:00:00, [ex:param="a"])

Here s is the optional start identifier, ex:act2 is the identifier of the ending activity, ex:trigger is the identifier of the entity that triggered the activity end, 2011-11-16T16:00:00 is the optional usage time, and [ex:param="a"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasEndedBy(ex:act2, ex:trigger, -)
wasEndedBy(ex:act2, ex:trigger, 2011-11-16T16:00:00)
wasEndedBy(ex:act2, -, 2011-11-16T16:00:00)
wasEndedBy(ex:act2, -, 2011-11-16T16:00:00, [ex:param="a"])
wasEndedBy(e,ex:act2, -, -)
wasEndedBy(e, ex:act2, ex:trigger, 2011-11-16T16:00:00)
wasEndedBy(-, ex:act2, ex:trigger, 2011-11-16T16:00:00)

Note:Even though the production endExpression allows for expressions wasEndedBy(e2, -, -) and wasEndedBy(-, e2, -, -), these expressions are not valid in PROV-N, since at least one of trigger, time, and attributes must be present.

3.1.7 Invalidation

invalidationExpression ::= wasInvalidatedBy ( ( identifier | - ) , eIdentifier , ( aIdentifier | - ) , ( time | - ) optional-attribute-values )

wasInvalidatedBy(ex:i1, tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00,  [ex:fct="save"])

Here ex:i1 is the optional invalidation identifier, tr:WD-prov-dm-20111215 is the identifier of the entity being invalidated, ex:edit1 is the optional identifier of the invalidating activity, 2011-11-16T16:00:00 is the optional invalidation time, and [ex:fct="save"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasInvalidatedBy(tr:WD-prov-dm-20111215, ex:edit1, -)
wasInvalidatedBy(tr:WD-prov-dm-20111215, ex:edit1, 2011-11-16T16:00:00)
wasInvalidatedBy(e2, a1, -, [ex:fct="save"])     
wasInvalidatedBy(e2, -, -, [ex:fct="save"])     
wasInvalidatedBy(ex:i1, tr:WD-prov-dm-20111215, ex:edit1, -)
wasInvalidatedBy(-, tr:WD-prov-dm-20111215, ex:edit1, -)

Even though the production invalidationExpression allows for expressions wasInvalidatedBy(e2, -, -) and wasInvalidatedBy(-, e2, -, -), these expressions are not valid in PROV-N, since at least one of activity, time, or attributes must be present.

3.1.8 Communication

communicationExpression ::= wasInformedBy ( ( identifier | - ) , aIdentifier , aIdentifier optional-attribute-values )

wasInformedBy(ex:inf1, ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])

Here ex:inf1 is the optional communication identifier, ex:a1 is the identifier of the informed activity, ex:a2 is the identifier of the informant activity, and [ex:param1="a", ex:param2="b"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasInformedBy(ex:a1, ex:a2)
wasInformedBy(ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasInformedBy(i, ex:a1, ex:a2)
wasInformedBy(i, ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasInformedBy(-, ex:a1, ex:a2)
wasInformedBy(-, ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])

3.1.9 Start by Activity

startByActivityExpression ::= wasStartedByActivity ( ( identifier | - ) , aIdentifier , aIdentifier optional-attribute-values )

wasStartedByActivity(s,ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])

Here s is the optional start-by-activity identifier, ex:a1 is the identifier of the starting activity, ex:a2 is the identifier of the activity that started ex:a1, and [ex:param1="a", ex:param2="b"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasStartedByActivity(ex:a1, ex:a2)
wasStartedByActivity(ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])
wasStartedByActivity(s,ex:a1, ex:a2)
wasStartedByActivity(-,ex:a1, ex:a2)
wasStartedByActivity(-,ex:a1, ex:a2, [ex:param1="a", ex:param2="b"])

3.2 Component 2: Agents and Responsibility

3.2.1 Agent

agentExpression ::= agent ( identifier optional-attribute-values )

agent(ag4, [ prov:type="prov:Person", ex:name="David" ])

Here ag is the agent identifier, and [ prov:type="prov:Person", ex:name="David" ] are optional attributes.

In the next example, the optional attributes are omitted.

agent(ag4)

3.2.2 Attribution

attributionExpression ::= wasAttributedTo ( ( identifier | - ) , eIdentifier , agIdentifier optional-attribute-values )

wasAttributedTo(id, e, ag, [ex:license="cc:attributionURL" %% "xsd:QName"])

Here id is the optional attribution identifier, e is an entity identifier, ag is the identifier of the agent to whom the entity is abscribed, and [ex:license="cc:attributionURL" %% "xsd:QName"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasAttributedTo(e, ag)
wasAttributedTo(e, ag, [ex:license="cc:attributionURL" %% "xsd:QName"])
wasAttributedTo(-,  e, ag, [ex:license="cc:attributionURL" %% "xsd:QName"])

3.2.3 Association

associationExpression ::= wasAssociatedWith ( ( identifier | - ) , aIdentifier , ( agIdentifier | - ) , ( eIdentifier | - ) optional-attribute-values )

wasAssociatedWith(ex:agas, ex:a1, ex:ag1, ex:e1, [ex:param1="a", ex:param2="b"])

Here ex:agas is the optional attribution identifier, ex:a1 is an activity identifier, ex:ag1 is the optional identifier of the agent associated to the activity, ex:e1 is the optional identifier of the plan used by the agent in the context of the activity, and [ex:param1="a", ex:param2="b"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasAssociatedWith(ex:a1, -, ex:e1)
wasAssociatedWith(ex:a1, ex:ag1, -)
wasAssociatedWith(ex:a1, ex:ag1, ex:e1)
wasAssociatedWith(ex:a1, ex:ag1, ex:e1, [ex:param1="a", ex:param2="b"])
wasAssociatedWith(a, ex:a1, -, ex:e1)
wasAssociatedWith(-, ex:a1, -, ex:e1)
wasAssociatedWith(-, ex:a1, ex:ag1, -)

Note:The production associationExpression allows for expressions wasAssociatedWith(a, -, -) and wasAssociatiedWith(-, a, -, -). However, these expressions are not valid in PROV-N, because at least one of agent or plan must be present.

3.2.4 Responsibility

responsibilityExpression ::= actedOnBehalfOf ( ( identifier | - ) , agIdentifier , agIdentifier , ( aIdentifier | - ) optional-attribute-values )

actedOnBehalfOf(act1, ag1, ag2, a, [prov:type="contract"])

Here act1 is the optional attribution identifier, ag1 is the identifier for the subordinate agent, ag2 is the identifier of the responsible agent, a is the optional identifier of the activity for which the responsibility link holds, and [prov:type="contract"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

actedOnBehalfOf(ag1, ag2, -)
actedOnBehalfOf(ag1, ag2, a)
actedOnBehalfOf(ag1, ag2, -, [prov:type="delegation"])
actedOnBehalfOf(ag2, ag3, a, [prov:type="contract"])
actedOnBehalfOf(r, ag2, ag3, a, [prov:type="contract"])
actedOnBehalfOf(-, ag1, ag2, -)

3.3 Component 3: Derivations

3.3.1 Derivation

derivationExpression ::= wasDerivedFrom ( ( identifier | - ) , eIdentifier , eIdentifier , ( aIdentifier | - ) , ( gIdentifier | - ) , ( uIdentifier | - ) optional-attribute-values )

wasDerivedFrom(d, e2, e1, a, g2, u1, [prov:comment="a righteous derivation"])

Here d is the optional derivation identifier, e2 is the identifier for the entity being derived, e1 is the identifier of the entity from which e2 is derived, a is the optional identifier of the activity which used/generated the entities, g2 is the optional identifier of the generation, u1 is the optional identifier of the usage, and [prov:comment="a righteous derivation"] are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasDerivedFrom(e2, e1)
wasDerivedFrom(e2, e1, a, g2, u1)
wasDerivedFrom(e2, e1, -, g2, u1)
wasDerivedFrom(e2, e1, a, -, u1)
wasDerivedFrom(e2, e1, a, g2, -)
wasDerivedFrom(e2, e1, a, -, -)
wasDerivedFrom(e2, e1, -, -, u1)
wasDerivedFrom(e2, e1, -, -, -)
wasDerivedFrom(d, e2, e1, a, g2, u1)
wasDerivedFrom(-, e2, e1, a, g2, u1)

3.3.2 Revision

revisionExpression ::= wasRevisionOf ( ( identifier | - ) , eIdentifier , eIdentifier , ( agIdentifier | - ) optional-attribute-values )

wasRevisionOf(rev1, tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, w3:Consortium, [prov:comment="??"] )

Here rev1 is the optional revision identifier, tr:WD-prov-dm-20111215 is the identifier of the revised entity, tr:WD-prov-dm-20111018 is the identifier of the original entity, w3:Consortium is the optional identifier of the agent involved in the revision, and [prov:comment="??"] ) are optional attributes.

The remaining examples show cases where some of the optionals are omitted.

wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)
wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, w3:Consortium)
wasRevisionOf(id,tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, w3:Consortium)
wasRevisionOf(tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)
wasRevisionOf(id,tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)
wasRevisionOf(-,tr:WD-prov-dm-20111215, tr:WD-prov-dm-20111018, -)

3.3.3 Quotation

quotationExpression ::= wasQuotedFrom ( ( identifier | - ) , eIdentifier , eIdentifier , ( agIdentifier | - ) , ( agIdentifier | - ) optional-attribute-values )

wasQuotedFrom(quoteId1, ex:blockQuote,ex:blog,ex:Luc,ex:Paul,[])

Here quoteId1 is the optional revision identifier, ex:blockQuote is the identifier of the entity that represents the quote (the partial copy) ex:blog is the identifier of the original entity being quoted, ex:Luc is the optional identifier of the agent who performs the quoting, ex:Paul is the optional identifier of the agent to whom the original entity is attributed, and [] is the (empty) optional set of attributes.

The remaining examples show cases where some of the optionals are omitted.

wasQuotedFrom(ex:blockQuote,ex:blog)
wasQuotedFrom(ex:blockQuote,ex:blog,ex:Luc,ex:Paul)
wasQuotedFrom(ex:blockQuote,ex:blog,-,ex:Paul)
wasQuotedFrom(ex:blockQuote,ex:blog,ex:Luc,ex:Paul,[])
wasQuotedFrom(ex:blockQuote,ex:blog, -, -)
wasQuotedFrom(id,ex:blockQuote,ex:blog,ex:Luc,ex:Paul)
wasQuotedFrom(-,ex:blockQuote,ex:blog,ex:Luc,-)

3.3.4 Original Source

originalSourceExpression ::= hadOriginalSource ( ( identifier | - ) , eIdentifier , eIdentifier optional-attribute-values )

hadOriginalSource(src1, ex:e1, ex:e2,[ex:param="a"])

Here src1 is the optional original source identifier, ex:e1 is the identifier of the derived entity, ex:e2 is the identifier of the original source entity, and [ex:param="a"] is the optional set of attributes.

The remaining examples show cases where some of the optionals are omitted.

hadOriginalSource(ex:e1, ex:e2)
hadOriginalSource(ex:e1, ex:e2,[ex:param="a"])
hadOriginalSource(-,ex:e1, ex:e2,[ex:param="a"])
hadOriginalSource(-,ex:e1, ex:e2)

3.3.5 Trace

traceExpression ::= tracedTo ( ( identifier | - ) , eIdentifier , eIdentifier optional-attribute-values )

tracedTo(id,e2,e1,[ex:param="a"])

Here id is the optional trace identifier, e2 is an entity identifier, e1 is the identifier for an ancestor entity that e2 depends on, and [ex:param="a"] is the optional set of attributes.

The remaining examples show cases where some of the optionals are omitted.

tracedTo(e2,e1)
tracedTo(e2,e1,[ex:param="a"])
tracedTo(-,e2,e1)

3.4 Component 4: Alternate Entities

3.4.1 Alternate

alternateExpression ::= alternateOf ( eIdentifier , eIdentifier )

alternateOf(tr:WD-prov-dm-20111215,ex:alternate-20111215)

Here tr:WD-prov-dm-20111215 is alternate for ex:alternate-20111215.

3.4.2 Specialization

specializationExpression ::= specializationOf ( eIdentifier , eIdentifier )

specializationOf(tr:WD-prov-dm-20111215,tr:prov-dm)

Here tr:WD-prov-dm-20111215 is a specialization of tr:prov-dm.

3.5 Component 5: Collections

Grammar for collections may under go minor syntactic changes since it has not been implemented yet.

In the productions in this section, nonterminals keyValuePairs and keySet are defined as follows.

keyValuePairs ::= ( literal , eidentifier ) | ( literal , eidentifier ) , keyValuePairs

keySet ::= literal | literal , keySet

3.5.1 Insertion

derivationByInsertionFromExpression ::= derivedByInsertionFrom ( identifier , cIdentifier , cIdentifier , { keyValuePairs } optional-attribute-values )

 derivedByInsertionFrom(id, c1, c, {("k1", v1), ("k2", v2)}, [])

Here id is the optional insertion identifier, c1 is the identifier for the collection after the insertion, c is the identifier for the collection before the insertion, {("k1", v1), ("k2", v2)} is the set of key-value pairs that have been inserted in c, and [] is the optional (empty) set of attributes.

The remaining examples show cases where some of the optionals are omitted.

 derivedByInsertionFrom(c1, c, {("k1", v1), ("k2", v2)})  
 derivedByInsertionFrom(c1, c, {("k1", v1)})  
 derivedByInsertionFrom(c1, c, {("k1", v1), ("k2", v2)}, [])

3.5.2 Removal

derivationByRemovalFromExpression ::= derivedByRemovalFrom ( identifier , cIdentifier , cIdentifier , { keySet } optional-attribute-values )

 derivedByRemovalFrom(id, c3, c, {"k1", "k3"}, [])

Here id is the optional removal identifier, c1 is the identifier for the collection after the removal, c is the identifier for the collection before the removal, {("k1", v1), ("k2", v2)} is the set of key-value pairs that have been removed from c, and [] is the optional (empty) set of attributes.

The remaining examples show cases where some of the optionals are omitted.

   derivedByRemovalFrom(c3, c1, {"k1", "k3"})               
   derivedByRemovalFrom(c3, c1, {"k1"})               
   derivedByRemovalFrom(c3, c1, {"k1", "k3"}, [])

3.5.3 Membership

membershipExpression ::= isMemberOf ( identifier , cIdentifier , { keyValuePairs } optional-attribute-values )

   memberOf(mid, c, {("k4", v4), ("k5", v5)}, [])

Here mid is the optional membership identifier, c is the identifier for the collection whose membership is stated, {("k4", v4), ("k5", v5)} is the set of key-value pairs that are members of c, and [] is the optional (empty) set of attributes.

The remaining examples show cases where some of the optionals are omitted.

   memberOf(c3, {("k4", v4), ("k5", v5)})
   memberOf(c3, {("k4", v4)})
   memberOf(c3, {("k4", v4), ("k5", v5)},[])

3.6 Component 6: Annotations

3.6.1 Note

noteExpression ::= note ( identifier optional-attribute-values )

note(ann1,[ex:color="blue", ex:screenX=20, ex:screenY=30])

3.6.2 Annotation

annotationExpression ::= hasAnnotation ( identifier , nIdentifier )

hasAnnotation(tr:WD-prov-dm-20111215,ex2:n1)

Here ex2:n1 is the identifier for a note of the entity identified by (tr:WD-prov-dm-20111215.

3.7 Further Expressions

This section defines further expressions of PROV-N.

3.7.1 Namespace Declaration

namespaceDeclarations ::= | defaultNamespaceDeclaration | namespaceDeclaration namespaceDeclaration
namespaceDeclaration ::= prefix prefix IRI
defaultNamespaceDeclaration ::= default IRI

In PROV-N, the following prefixes are reserved:

prov denotes the PROV namespace with URI http://www.w3.org/ns/prov#
xsd denotes the XML Schema namespace with URI http://www.w3.org/2001/XMLSchema#.

A PROV-N document must not redeclare prefixes prov and xsd.

The following example declares three namespaces, one default, and two with explicit prefixes ex1 and ex2.

container
  default <http://example.org/0/>
  prefix ex1 <http://example.org/1/>
  prefix ex2 <http://example.org/2/>
...
end

3.7.2 Identifier

A qualified name is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name. The PROV data model stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part.

A qualified name's prefix is optional. If a prefix occurs in a qualified name, it refers to a namespace declared in a namespace declaration. In the absence of prefix, the qualified name refers to the default namespace.

identifier ::= qualifiedName
eIdentifier ::= identifier (intended to denote an entity)
aIdentifier ::= identifier (intended to denote an activity)
agIdentifier ::= identifier (intended to denote an agent)
gIdentifier::= identifier (intended to denote a generation)
uIdentifier::= identifier (intended to denote a usage)
nIdentifier::= identifier (intended to denote a note)
cIdentifier::= identifier (intended to denote a collection)

qualifiedName ::= prefix : localPart | : localPart
prefix ::= a name without colon compatible with the NC_NAME production [XML-NAMES]
localPart ::= a name compatible with the reference production [RDFA-CORE]

A PROV qualified name has a more permissive syntax then XML's QName [XML-NAMES] since it allows any syntax for its local part provided that the concatenation with the namespace results in a valid IRI [IRI].

Examples of articles on the BBC Web site seen as entities.

container
  prefix bbc <http://www.bbc.co.uk/>
  prefix bbcNews <http://www.bbc.co.uk/news/>

  entity(bbc:)                          // bbc site itself
  entity(bbc:news/)                     // bbc news
  entity(bbc:news/world-asia-17507976)  // a given news article

  entity(bbcNews:)                      // an alternative way of referring to the bbc news site

end

Examples of entities with declared and default namespace.

container
  default <http://example.org/2/>
  prefix ex <http://example.org/1/>

  entity(ex:a)     //  corresponds to IRI http://example.org/1/a
  entity(ex:a/)    //  corresponds to IRI http://example.org/1/a/
  entity(ex:a/b)   //  corresponds to IRI http://example.org/1/a/b
  entity(b)        //  corresponds to IRI http://example.org/2/b
  entity(ex:1234)  //  corresponds to IRI http://example.org/2/1234
  entity(4567)     //  corresponds to IRI http://example.org/2/4567
end

Note:The productions for qualifiedName and prefix are conflicting. In the context of a namespaceDeclaration, a parser should give precedence to the production for prefix.

We need to explicitly disallow '-' as first and only character of local part. Instead, it should be encoded as pct-encoded [RFC3987].

3.7.3 Attribute

attribute ::= qualifiedName

The reserved attributes in the PROV namespace are the following.

prov:label
prov:location
prov:role
prov:type
prov:value

3.7.4 Literal

Literal ::= typedLiteral | convenienceNotation
typedLiteral ::= quotedString %% datatype
datatype ::= qualifiedName listed in Table permitted-datatypes
convenienceNotation ::= stringLiteral | intLiteral
stringLiteral ::= quotedString
quotedString ::= a finite sequence of characters in which " (#x22) and \ (#x5C) occur only in pairs of the form \" (#x5C, #x22) and \\ (#x5C, #x5C), enclosed in a pair of " (#x22) characters
intLiteral ::= a finite-length non-empty sequence of decimal digits (#x30-#x39) with an optional leading negative sign (-)

The non terminals stringLiteral and intLiteral are syntactic sugar for quoted strings with datatype xsd:string and xsd:int, respectively.

In particular, a Literal may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no specific interpretation in the context of PROV.

Permitted datatypes in literals
xsd:decimal	xsd:double	xsd:dateTime
xsd:integer	xsd:float
xsd:nonNegativeInteger	xsd:string	rdf:XMLLiteral
xsd:nonPositiveInteger	xsd:normalizedString
xsd:positiveInteger	xsd:token
xsd:negativeInteger	xsd:language
xsd:long	xsd:Name
xsd:int	xsd:NCName
xsd:short	xsd:NMTOKEN
xsd:byte	xsd:boolean
xsd:unsignedLong	xsd:hexBinary
xsd:unsignedInt	xsd:base64Binary
xsd:unsignedShort	xsd:anyURI
xsd:unsignedByte	xsd:QName

Note:The productions for qualifiedName and intLiteral are conflicting. In the context of a Literal, a parser should give precedence to the production for intLiteral.

Wouldn't it be useful to introduce a literal for a qualified name? Currenlty, we have to write:

prov:type="ex:Programmer"  %% xsd:QName

to indicate that ex:Programmer should be interpreted as qualified name (QName). Instead, we could have a notation such as

prov:type='ex:Programmer'

3.7.4.1 Reserved Type Values

The reserved type values in the PROV namespace are the following.

prov:Account
prov:SoftwareAgent
prov:Person
prov:Organization
prov:Plan
prov:Collection
prov:EmptyCollection

The entity ag is a person (type: prov:Person), whereas the entity pl is a plan (type: prov:Plan).

agent(ag,[prov:type="prov:Person" %% xsd:QName])
entity(pl,[prov:type="prov:Plan" %% xsd:QName])

3.7.4.2 Time Values

Time instants are defined according to xsd:dateTime [XMLSCHEMA-2].

The third argument in the following usage expression is a time instance, namely 4pm on 2011-11-16.

used(ex:act2, ar3:0111, 2011-11-16T16:00:00)

4. Expression Container

An expression container is a house-keeping construct of PROV-N capable of packaging up PROV-N expressions and namespace declarations. An expression container forms a self-contained package of provenance descriptions for the purpose of exchanging them. An expression container may be used to package up PROV-N expressions in response to a request for the provenance of something ([PROV-AQ]).

Given its status of house keeping construct for the purpose of exchanging provenance expressions, an expression container is not defined as a PROV-N expression (production expression).

An expression container, written container decls exprs endContainer in PROV-N, contains:

namespaceDeclarations: a set decls of namespace declarations, declaring namespaces and associated prefixes, which can be used in attributes and identifiers occurring inside exprs;
expressions: a non-empty set of expressions exprs.

An expression container's text matches the expressionContainer production.

expressionContainer ::= container namespaceDeclarations expression endContainer

The following container contains expressions related to the provenance of entity e2.

container
  prefix ex <http://example.org/>

  entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", 
               ex:content="There was a lot of crime in London last month."])
  activity(a1, 2011-11-16T16:05:00, -,[prov:type="edit"])
  wasGeneratedBy(e2, a1, -, [ex:fct="save"])     
  wasAssociatedWith(a1, ag2, -, [prov:role="author"])
  agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ])

endContainer

This container could for instance be returned as the result of a query to a provenance store for the provenance of entity e2 [PROV-AQ].

5. Account

The PROV data model has introduced a notion of account by which a set of provenance descriptions can be bundled up and named. The PROV data model assumes the existence of mechanisms to implement accounts, but such mechanisms remain outside its scope. It is suggested that specific serializations may offer solutions to name bundles of descriptions.

Given that the primary motivation for PROV-N is to provide a notation aimed at human consumption, it is therefore appropriate to introduce a notation for accounts, which would include an account name and a bundle of expressions.

An account, written account(id, exprs) in PROV-N, contains:

id: an identifier that identifies this account;
expressions: a set exprs of expressions;

In PROV-N, an account's text matches the accountExpression production of the grammar.

accountExpression ::= account ( identifier , expression )

It is also useful to package up one or more account expressions in an expression container, for interchange purpose. Hence, expressionContainer is revised as follows.

expressionContainer ::= container namespaceDeclarations expression endContainer
| container namespaceDeclarations accountExpression endContainer

The following container

container
  prefix ex <http://example.org/>

  account(ex:acc1,...)
  account(ex:acc2,...)
endContainer

illustrates how two accounts with identifiers ex:acc1 and ex:acc2 can be returned in a PROV-N serialization of the provenance of something.

The following container

container
  prefix ex <http://example.org/>
  ...

  account(ex:acc1,
      entity(tr:WD-prov-dm-20111018, [ prov:type="pr:RecsWD" %% xsd:QName ])
      entity(tr:WD-prov-dm-20111215, [ prov:type="pr:RecsWD" %% xsd:QName ])
      ...
      wasAssociatedWith(ex:pub2, w3:Consortium, pr:rec-advance))

  account(ex:acc2,
      entity(ex:acc1, [prov:type="prov:Account" %% xsd:QName ])
      wasAttributedTo(ex1:acc1, w3:Consortium))

endContainer

illustrates a first account, with identifier ex:acc1, containing expressions describing the provenance of the technical report tr:WD-prov-dm-20111215, and a second account ex:acc2, describing the provenance of the first. In account ex:acc2, ex:acc1 is the identifier of an entity of type prov:Account.

6. Media Type

The media type of PROV-N is text/prov-n. The content encoding of PROV-N content is UTF-8.

See http://www.w3.org/2002/06/registering-mediatype for Register an Internet Media Type for a W3C Spec.

A. Acknowledgements

WG membership to be listed here.

B. References

B.1 Normative references

[IRI]: M. Duerst, M. Suignard. Internationalized Resource Identifiers (IRI). January 2005. Internet RFC 3987. URL: http://www.ietf.org/rfc/rfc3987.txt
[RDF-CONCEPTS]: Graham Klyne; Jeremy J. Carroll. Resource Description Framework (RDF): Concepts and Abstract Syntax. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RDFA-CORE]: Shane McCarron; et al. RDFa Core 1.1: Syntax and processing rules for embedding RDF through attributes. 13 March 2012. W3C Candidate Recommendation. URL: http://www.w3.org/TR/2012/CR-rdfa-core-20120313/
[RFC2119]: S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[RFC3987]: M. Dürst; M. Suignard. Internationalized Resource Identifiers (IRIs). January 2005. Internet RFC 3987. URL: http://www.ietf.org/rfc/rfc3987.txt
[URI]: T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifiers (URI): generic syntax. January 2005. Internet RFC 3986. URL: http://www.ietf.org/rfc/rfc3986.txt
[XML-NAMES]: Richard Tobin; et al. Namespaces in XML 1.0 (Third Edition). 8 December 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-xml-names-20091208/
[XMLSCHEMA-2]: Paul V. Biron; Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. 28 October 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/

B.2 Informative references

[PROV-AQ]: Graham Klyne and Paul Groth (eds.) Luc Moreau, Olaf Hartig, Yogesh Simmhan, James Meyers, Timothy Lebo, Khalid Belhajjame, and Simon Miles Provenance Access and Query. 2011, Working Draft. URL: http://www.w3.org/TR/prov-aq/
[PROV-CONSTRAINTS]: James Cheney, Paolo Missier, and Luc Moreau (eds.)Constraints of the Prov Data Model. 2012, Working Draft. URL: http://www.w3.org/TR/prov-constraints/
[PROV-DM]: Luc Moreau and Paolo Missier (eds.) Khalid Belhajjame, Reza B'Far, Stephen Cresswell, Yolanda Gil, Paul Groth, Graham Klyne, Jim McCusker, Simon Miles, James Myers, Satya Sahoo, and Curt TilmesPROV-DM: The PROV Data Model. 2012, Working Draft. URL: http://www.w3.org/TR/prov-dm/
[PROV-RDF]: James CheneyPROV-RDF Mapping 2012, Working in Progress. URL: http://www.w3.org/2011/prov/wiki/ProvRDF
[PROV-SEM]: James Cheney Formal Semantics Strawman. 2011, Work in progress. URL: http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman
[PROV-XML]: James CheneyPROV-XML Mapping 2012, Working in Progress. URL: http://www.w3.org/2011/prov/wiki/ProvXML

PROV-N: The Provenance Notation

W3C Working Draft 03 May 2012

Abstract

Status of This Document

PROV Family of Specifications

How to read the PROV Family of Specifications

First Public Working Draft

Table of Contents

1. Introduction

1.1 Purpose of this Document and target audience

1.2 Structure of this Document

1.3 Notational Conventions

2. General grammar considerations

2.1 Functional-style Syntax

2.2 EBNF Grammar

2.3 Optional terms in expressions

2.4 Identifiers and attributes

3. PROV-N Productions per Component

3.1 Component 1: Entities and Activities

3.1.1 Entity

3.1.2 Activity

3.1.3 Generation

3.1.4 Usage

3.1.5 Start

3.1.6 End

3.1.7 Invalidation

3.1.8 Communication

3.1.9 Start by Activity

3.2 Component 2: Agents and Responsibility

3.2.1 Agent

3.2.2 Attribution

3.2.3 Association

3.2.4 Responsibility

3.3 Component 3: Derivations

3.3.1 Derivation

3.3.2 Revision

3.3.3 Quotation

3.3.4 Original Source

3.3.5 Trace

3.4 Component 4: Alternate Entities

3.4.1 Alternate

3.4.2 Specialization

3.5 Component 5: Collections

3.5.1 Insertion

3.5.2 Removal

3.5.3 Membership

3.6 Component 6: Annotations

3.6.1 Note

3.6.2 Annotation

3.7 Further Expressions

3.7.1 Namespace Declaration

3.7.2 Identifier

3.7.3 Attribute

3.7.4 Literal

3.7.4.1 Reserved Type Values

3.7.4.2 Time Values

4. Expression Container

5. Account

6. Media Type

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references