en W3C - Dataset Exchange Working Group The mission of the Dataset Exchange WG is to: 1. Maintain and revise the Data Catalog Vocabulary, DCAT, taking into account feature requests from the DCAT user community. 2. Define and publish guidance on the specification and use of application profiles when requesting and serving data on the Web. Sun, 02 Mar 2025 03:08:15 +0000 Laminas_Feed_Writer 2 (https://getlaminas.org) https://www.w3.org/groups/wg/dx/ Data Catalog Vocabulary (DCAT) - Version 3 is a W3C Recommendation <![CDATA[DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.]]> Thu, 22 Aug 2024 02:00:00 +0000 https://www.w3.org/news/2024/data-catalog-vocabulary-dcat-version-3-is-a-w3c-recommendation/ https://www.w3.org/news/2024/data-catalog-vocabulary-dcat-version-3-is-a-w3c-recommendation/ <![CDATA[news]]> <![CDATA[

The Dataset Exchange Working Group published Data Catalog Vocabulary (DCAT) - Version 3 as a W3C Recommendation. DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use. 

DCAT enables a publisher to describe datasets and data services in a catalog using a standard model and vocabulary that facilitates the consumption and aggregation of metadata from multiple catalogs. This can increase the discoverability of datasets and data services. It also makes it possible to have a decentralized approach to publishing data catalogs and makes federated search for datasets across catalogs in multiple sites possible using the same query mechanism and structure. Aggregated DCAT metadata can serve as a manifest file as part of the digital preservation process.

]]>
0
Data Catalog Vocabulary (DCAT) - Version 3 is a W3C Proposed Recommendation <![CDATA[DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.]]> Thu, 13 Jun 2024 08:15:00 +0000 https://www.w3.org/news/2024/data-catalog-vocabulary-dcat-version-3-is-a-w3c-proposed-recommendation/ https://www.w3.org/news/2024/data-catalog-vocabulary-dcat-version-3-is-a-w3c-proposed-recommendation/ <![CDATA[news]]> <![CDATA[

Today the Dataset Exchange Working Group published Data Catalog Vocabulary (DCAT) - Version 3 as a W3C Proposed Recommendation. DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use. 

DCAT enables a publisher to describe datasets and data services in a catalog using a standard model and vocabulary that facilitates the consumption and aggregation of metadata from multiple catalogs. This can increase the discoverability of datasets and data services. It also makes it possible to have a decentralized approach to publishing data catalogs and makes federated search for datasets across catalogs in multiple sites possible using the same query mechanism and structure. Aggregated DCAT metadata can serve as a manifest file as part of the digital preservation process.

]]>
0
W3C Invites Implementations of Data Catalog Vocabulary (DCAT) - Version 3 <![CDATA[DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.]]> Thu, 18 Jan 2024 03:00:00 +0000 https://www.w3.org/news/2024/w3c-invites-implementations-of-data-catalog-vocabulary-dcat-version-3/ https://www.w3.org/news/2024/w3c-invites-implementations-of-data-catalog-vocabulary-dcat-version-3/ <![CDATA[news]]> <![CDATA[

The Dataset Exchange Working Group invites implementations of the Data Catalog Vocabulary (DCAT) - Version 3 Candidate Recommendation Snapshot. DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.

This document defines a major revision of the DCAT 2 vocabulary ([VOCAB-DCAT-2]) in response to use cases, requirements and community experience which could not be considered during the previous vocabulary development. This revision extends the DCAT standard in line with community practice while supporting diverse approaches to data description and dataset exchange.

Comments are welcome via the GitHub issues by 15 February 2024.

]]>
0
Data Catalog Vocabulary (DCAT) v3: 2nd Public Working Draft Tue, 18 May 2021 21:00:00 +0000 https://www.w3.org/blog/2021/dcat3-pwd2/ https://www.w3.org/blog/2021/dcat3-pwd2/ Peter Winstanley https://www.w3.org/blog/2021/dcat3-pwd2/#comments <![CDATA[data]]> <![CDATA[blogs]]> Peter Winstanley <![CDATA[

This message is to update you on the work of the W3C Data Exchange Working Group [1] and to ask for your help in reviewing progress on the third revision of the RDF vocabulary for data catalogs, DCAT, that was published on 04 May 2021. The Second Public Working Draft of the revision is available at https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/

The revision of DCAT is part of a group of deliverables described in the Charter [2], but it can be read as a stand-alone recommendation on how catalogs of resources should be published on the web..

This version especially focuses on the areas of versioning [3]  and dataset series [4]. The list of changes since the first public working draft of 17 December 2020 is available at [5]. 

In reviewing the draft, it might be helpful for you to keep in mind the initial “Use Cases and Requirements” document that we are working towards [6], the issues log associated with this milestone in the development of the recommendation [7] and the remaining issues [8].

We welcome feedback along the following lines:

  1. Do you agree with the direction of travel of this revision of DCAT?
  2. Are there any areas where we could improve what we have done? [please illustrate]
  3. Are there any areas where you think the proposal is wrong or could lead us into developing proposals that are erroneous? [please give examples and reasons]
  4. Are there other use cases for data catalos and datasets descriptions that we have not considered? [please illustrate]

Please also feel free to make any other comments and suggestions regarding the draft.

Please, send comments through github issues (https://github.com/w3c/dxwg/issues) or through email at: [email protected]

Best wishes

[1] https://www.w3.org/2017/dxwg/wiki/Main_Page 

[2] https://www.w3.org/2020/02/dx-wg-charter.html 

[3]https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/#dataset-versions 

[4] https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/#dataset-series 

[5] https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/#changes-since-20201217 

[6] https://www.w3.org/TR/dcat-ucr/

[7] https://github.com/w3c/dxwg/milestone/28  

[8] https://github.com/w3c/dxwg/issues

]]>
0
Data Catalog Vocabulary (DCAT) Version 2 Published Today Tue, 04 Feb 2020 13:28:34 +0000 https://www.w3.org/blog/2020/data-catalog-vocabulary-dcat-version-2-published-today/ https://www.w3.org/blog/2020/data-catalog-vocabulary-dcat-version-2-published-today/ Peter Winstanley https://www.w3.org/blog/2020/data-catalog-vocabulary-dcat-version-2-published-today/#comments <![CDATA[data]]> <![CDATA[blogs]]> Peter Winstanley <![CDATA[

Today the W3C Dataset Exchange Working Group (DXWG) published version 2 of the Data Catalog (DCAT) vocabulary as a W3C “Recommendation”.  DCAT gives people and machines a specific and domain-independent approach to create catalogs that express the core elements of a dataset description in a standardized way that is suitable for publication on the Web, and enables cross-domain interoperability by being used either on its own or alongside, as a complement to other data catalog standards. Thanks to this, DCAT facilitates effective search and retrieval, and permits easy scaling up of the query process either through "frictionless" aggregation of dataset descriptions and catalog records from many different sources and domains, or by applying the same query across multiple catalogs and aggregating the results. These patterns can also be varied slightly so as to provide communities with tailored approaches to the dataset catalog that respect the specific nuances of a particular type of data.

Version 2 builds on the initial work published in 2014 by providing, among other things, classes of descriptors that can be used for data services, and a wider set of relationships characterizing datasets and their temporal and spatial aspects. It also removes the constraints that were inherent in the prescribed use of some vocabulary terms for relationships (properties) that were present in its original version, so making their usage pattern more flexible.

Although the expectation is that dataset publishers will want to revise their existing catalogs, in line with their general activities of curation and update to make use of the additional features available in version 2, compatibility between the new version and the earlier version of the DCAT vocabulary has been preserved.

The WG has also made an effort in (i) providing multilingual descriptions of the different terms and properties, facilitating their application across the world; and (ii) explaining the alignment to the Schema.org vocabulary, which is the metadata set most widely used by search engines to optimize the indexing of Web content, and now increasingly being adopted also in data catalogs.

Within just a few years from its first release in 2014, DCAT has become recognised as a key interoperability standard for data catalogs in many countries and organizations. Search engine providers are using it to identify data assets to catalog, and publishers are using it to make their materials more findable. Going forward, the WG expects the incorporation of classes to describe data services into the model will make DCAT an increasingly useful tool in data science and provide a well-trodden path for those implementing the FAIR Principles to follow.

The DXWG appreciates hearing about any implementations of catalogs using DCAT v2. We would also like to know about any errors that you find or problems that you experience so that these can be fed into the ongoing management of version 2, and potentially influence changes to be made in version 3, whose work has just started. You can provide feedback on errors or difficulties you experience with DCAT v2 to the WG either by email to [email protected] or through the dedicated errata page. For new use cases and other issues, please contact us via email or by submitting an issue in the dedicated GitHub repository.  We hope that you find this standard a useful addition to your data publications.

]]>
1
Dataset Exchange Working Group Is Making Progress Thu, 13 Jun 2019 13:46:00 +0000 https://www.w3.org/blog/2019/dataset-exchange-working-group-is-making-progress/ https://www.w3.org/blog/2019/dataset-exchange-working-group-is-making-progress/ Peter Winstanley https://www.w3.org/blog/2019/dataset-exchange-working-group-is-making-progress/#comments <![CDATA[data]]> <![CDATA[blogs]]> Peter Winstanley <![CDATA[

What are the issues?

The history of computing has been one associated with the realisation that information aggregated was often information with increased value. This led to the conflicting positions of those wanting to merge data from diverse sources to distil more value, and those wanting to prevent the merging of information to retain privacy or other control of processing, or to prevent inappropriate use of data that was in some way felt unfit for general processing. The approach taken by those wanting to exchange datasets bilaterally was to prepare some data interchange agreement (DIA) that explained how the model of data from one party would fit with the data models of the other. This agreement would also cover licensing, and other caveats about the use of the data. This DIA was often a large text document with tables, and often had hand-written signatures to establish the authority of the agreement between parties. This approach changed radically with the advent of the World Wide Web and the scope it provides for dataset exchange at global scale between millions of computers and their users. The advent of the open data movement was the natural progression of this where both citizens and administrations were keen to establish the conditions where significant economic benefit could be obtained from the re-use of public sector information.

The research environment has followed a similar journey as teams and institutions have discovered not only the benefit of being able to aggregate information, but have also been encouraged to make their datasets available as part of the research reproducibility and research transparency agendas.  However, in a similar way to the usage agreement aspect of the DIA, Data Sharing Agreements (DSA) have been brought in, particularly in areas such as genomics and other health-related areas where funding bodies such as the US National Institutes of Health have a set of policies for researchers to comply with.

Where is the earlier work?

The provision of guidelines for administrations on how to publish 'open data' was pivotal to the W3C development of the 2017 recommendation on how to publish data on the web  that built on the previously developed first version of the W3C standard vocabulary for publishing data catalogs on the Web (DCAT), published three years earlier. The European Commission and national governments adopted this standard for catalogs. In some cases, however, they felt certain elements were missing and they often also wanted to specify which controlled vocabularies to use. This led to the creation of 'application profiles' through which a data publisher could supplement the DCAT vocabulary with elements taken from vocabularies developed in other standardisation efforts, and when necessary also add further constraints. There are a large number of individual application profiles centred on DCAT for data catalogs of individual national administrations or specific dataset types, such as statistical (StatDCAT ) or geospatial (GeoDCAT ).

DCAT Version 2

In 2017 W3C realised that there would be benefit in re-examining the whole situation with dataset exchange on the web and chartered the Dataset Exchange working group [DXWG]  to revise DCAT and to also examine the role and scope of application profiles in requesting and serving data on the Web. The revision of DCAT is now in the late stages of the standards development process. The latest public Working Draft is available   and readers are encouraged to make themselves aware of this work and provide feedback to the public mailing list at [email protected]  and/or as Github issues .

Anything else to think about?

In addition to the DIAs and DSAs, another acronym associated with the process of dataset exchange is "ETL" - this is the Extraction, Transformation and Loading effort that is often required when a party gets a datasets to be merged that  use different models or schemas. ETL is often a considerable effort that is only necessary because the parties are using different models. But generally, ETL takes effort but doesn't add value. The ideal situation would be to avoid this essentially nugatory work.  There is already a mechanism on the Web for a server to be given an ordered set of choices of the serialisation type for returning a dataset to a client (e.g. preferably XML, if not that then CSV, and if not that then the default HTML). This "content negotiation" has a specific mechanism that depends on providing this ordered list to the server, generally through use of the HTTP "Accept" header.  Given that the "application profiles" mentioned earlier describe the model that a dataset such as a data catalog has to conform to for it to be valid in a certain context, there is need for a mechanism where a client can use a list of profiles to indicate to a web server which profile or profiles it would prefer the data provided by the server to adhere to. Since this provides a contract between a data provider and a data consumer, the indication of profile preferences could, amongst other things, perhaps reduce the need for an ETL step in dataset exchange.

Content Negotiation By Profile

The DXWG is also making strong progress in developing a recommendation for "Content Negotiation by Profile" and the Second Public Working Draft was published for community review on 30th April 2019.  Readers are encouraged to read this draft and to provide their feedback. Thus, for both specification, we will welcome feedback, including positive support for the proposal being developed, to the public mailing list at [email protected]  and/or as Github issues .

Conclusion

Through the combination of improved DCAT for facilitating the discovery of datasets, guidance on profiles (which is still in the early stages of development), and a recommendation on mechanisms that could allow a client to provide an ordered set of choices of profile or model for the datasets it wants returned from servers, the DXWG is working to provide a framework of standards and recommended designs/strategies to help developers improve automation in discovering and merging datasets to deliver the increased value that people expect to gain from data aggregation, whilst at the same time providing a mechanism to automate the selection of models that might reduce the ETL requirement or deliver another preferred model.

Acknowledgements:  Thanks to Alejandra Gonzalez-Beltran and Lars G Svensson for helpful comments

]]>
0
Possible future directions for data on the Web Tue, 27 Jun 2017 06:53:00 +0000 https://www.w3.org/blog/2017/possible-future-directions-for-data-on-the-web/ https://www.w3.org/blog/2017/possible-future-directions-for-data-on-the-web/ https://www.w3.org/blog/2017/possible-future-directions-for-data-on-the-web/#comments <![CDATA[data]]> <![CDATA[blogs]]> <![CDATA[

As I enter my final days as a member of the W3C Team*, I’d like to record some brief notes for what I see as possible future directions in the areas in which I’ve been most closely involved, particularly since taking on the ‘data brief’ 4 years ago.

Foundations

The Data on the Web Best Practices, which became a Recommendation in January this year, forms the foundation. As I highlighted at the time, it sets out the steps anyone should take when sharing data on the Web, whether openly or not, encouraging the sharing of actual information, not just information about where a dataset can be downloaded. A domain-specific extension, the Spatial Data on the Web Best Practices, is now all-but complete. There again, the emphasis is on making data available directly on the Web so that, for example, search engines can make use of it directly and not just point to a landing page from where a dataset can be downloaded – what I call using the Web as a glorified USB stick.

Spatial Data

That specialized best practice document is just one output from the Spatial Data on the Web WG in which we have collaborated with our sister standards body, the Open Geospatial Consortium, to create joint standards. Plans are being laid for a long term continuation of that relationship which has exciting possibilities in VR/AR, Web of Things, Building Information Models, Earth Observations, and a best practices document looking at statistical data.

Research Data

Another area in which I very much hope W3C will work closely with others is in research data: life sciences, astronomy, oceanography, geology, crystallography and many more ‘ologies.’ Supported by the VRE4EIC project, the Dataset Exchange WG was born largely from this area and is leading to exciting conversations with organizations including the Research Data Alliance, CODATA, and even the UN. This is in addition to, not a replacement for, the interests of governments in the sharing of data. Both communities are strongly represented in the DXWG that will, if it fulfills its charter, make big improvements in interoperability across different domains and communities.

Linked Data

 

A line graph showing an initial peak of inflated peak of expectations, followed by the trough of disillusionment, the slope of enlightenment and the plateau of productivity
The Gartner Hype Cycle. CC: BY-SA Jeremykemp at English Wikipedia

 

The use of Linked Data continues to grow; if we accept the Gartner Hype Cycle as a model then I believe that, following the Trough of Disillusionment, we are well onto the Slope of Enlightenment. I see it used particularly in environmental and life sciences, government master data and cultural heritage. That is, it’s used extensively as a means of sharing and consuming data across departments and disciplines. However, it would be silly to suggest that the majority of Web Developers are building their applications on SPARQL endpoints. Furthermore, it is true that if you make a full SPARQL endpoint available openly, then it’s relatively easy to write a query that will be so computationally expensive as to bring the system down. That’s why the BBC, OpenPHACTS and others don’t make their SPARQL endpoints publicly available. Would you make your SQL interface openly available? Instead, they provide a simple API that runs straightforward queries in the background that a developer never sees. In the case of the BBC, even their API is not public, but it powers a lot of the content on their Web site.

The upside of this approach is that through those APIs it's easy to access high value, integrated data as developer-friendly JSON objects that are readily dealt with. From a publisher’s point of view, the API is more stable and reliable. The irritating downside is that people don’t see and therefore don't recognize the Linked Data infrastructure behind the API allowing the continued questioning of the value of the technology.

Semantic Web, AI and Machine Learning

The main Semantic Web specs were updated at the beginning of 2014 and there are no plans to review the core RDF and OWL specs any time soon. However, that doesn’t mean that there aren’t still things to do.

One spec that might get an update soon is JSON-LD. The relevant Community Group has continued to develop the spec since it was formally published as a Rec and would now like to put those new specs through Rec Track. Meanwhile, the Shapes Constraint Language. SHACL, has been through something of a difficult journey but is now at Proposed Rec, attracting significant interest and implementation.

But, what I hear from the community is that the most pressing ‘next thing’ for the Semantic Web should be what I call ‘annotated triples.’ RDF is pretty bad at describing and reflecting change: someone changes job, a concert ticket is no longer valid, the global average temperature is now y not x and so on. Furthermore, not all ‘facts’ are asserted with equal confidence. Natural Language Processing, for example, might recognize a ‘fact’ within a text with only 75% certainty.

It’s perfectly possible to express these now using Named Graphs, however, in talks I’ve done recently where I’ve mentioned this, including to the team behind Amazon’s Alexa, there has been strong support for the idea of a syntax that would allow each tuple to be extended with ‘validFrom’, validTo and ‘probability’. Other possible annotations might relate to privacy, provenance and more. Such annotations may be semantically equivalent to creating and annotating a named graph, and RDF 1.1 goes a long way in this direction, but I've received a good deal of anecdotal evidence that a simple syntax might be a lot easier to process. This is very relevant to areas like AI, deep learning and statistical analysis.

These sorts of topics were discussed at ESWC recently and I very much hope that there will be a W3C workshop on it next year, perhaps leading to a new WG. A project proposal was submitted to the European Commission recently that would support this, and others interested in the topic should get in touch.

Other possible future work in the Semantic Web includes a common vocabulary for sharing the results of data analysis, natural language processing etc. The Natural Language Interchange Format, for example, could readily be put through Rec Track.

Vocabularies and schema.org

Common vocabularies, maintained by the communities they serve, are an essential part of interoperability. Whether it’s researchers, governments or businesses, better and easier maintenance of vocabularies and a more uniform approach to sharing mappings, crosswalks and linksets, must be a priority. Internally at least, we have recognized for years that W3C needs to be better at this. What’s not so widely known is that we can do a lot now. Community Groups are a great way to get a bunch of people together and work on your new schema and, if you want it, you can even have a www.w3.org/ns namespace (either directly or via a redirect). Again, subject to an EU project proposal being funded, there should be money available to improve our tooling in this regard.

W3C will continue to support the development of schema.org which is transforming the amount of structured data embedded within Web pages. If you want to develop an extension for schema.org, a Community Group and a discussion on [email protected] is the place to start.

Summary

To summarize, my personal priorities for W3C in relation to data are:

  1. Continue and deepen the relationship with OGC for better interoperability between the Web and geospatial information systems.
  2. Develop a similarly deep relationship with the research data community.
  3. Explore the notion of annotating RDF triples for context, such as temporal and probabilistic factors.
  4. Be better at supporting vocabulary development and their agile maintenance.
  5. Continue to promote the Linked Data/Semantic Web approach to data integration that can sit behind high value and robust JSON-returning APIs.

I'll be watching …

As of 1 July I’ll be at GS1, working on improving the retail world’s use of the Web. Keep in touch via my personal website and @philarcher1.

]]>
1
Why we’re launching the Dataset Exchange WG Tue, 16 May 2017 15:35:00 +0000 https://www.w3.org/blog/2017/why-were-launching-the-dataset-exchange-wg/ https://www.w3.org/blog/2017/why-were-launching-the-dataset-exchange-wg/ https://www.w3.org/blog/2017/why-were-launching-the-dataset-exchange-wg/#comments <![CDATA[data]]> <![CDATA[blogs]]> <![CDATA[

I’m delighted that the Dataset Exchange Working Group is beginning its work this week to pursue two distinct strands:

  • updating the Data Catalog vocabulary (DCAT);
  • providing a precise definition of an application profile and setting out how clients and servers can use them in content negotiation.

Within that environment, it’s likely that the WG will also look at how to create and share linksets (a.k.a. mappings) between vocabulary terms.

The DCAT vocabulary has been widely adopted but it’s clear from multiple instances and related work that DCAT lacks important features that need to be formally added. There’s also a question about exactly what its scope is. Metadata serves three primary functions:

  1. Discovery
  2. Assessment (am I allowed to use this data? Is it of sufficient quality for my purpose? What is its provenance?)
  3. Structure

These broadly map to the research data world’s FAIR Principles (Findable, Accessible, Interoperable, Reusable). How far down that list should a general purpose dataset description vocabulary go? What’s the appropriate use of schema.org cf. Dublin Core?

The concept of profiles is not new, nor is the idea that a client might use an HTTP Accept Header suggesting a preference for, say, Turtle over JSON – but what is not fully supported is a more fine grained request for, say, JSON according to schema X, or Turtle following profile Y. The new W3C WG will track the closely related work at the IETF on defining a new header and document how that new method should be used where possible and what the fall back options are.

The WG has been formed in response to the SDSVoc workshop held at the end of last year. Supported by the EU-funded VRE4EIC project, both the workshop and the WG successfully bring together two distinct communities: government data and scientific research data. This is reflected in the WG’s chairs. Caroline Burle from Nic.br is a key figure in open government data throughout South America, and Karen Coyle of the Dublin Core Metadata Initiative has unparalleled experience in library metadata, including application profiles. Noting related work around general and spatial data on the Web best practices, ODRL, tabular metadata and more, after the WG has completed its work we should be nearer the time when data can be found, assessed and reused with minimal human intervention.

]]>
0