2016/02/10
Tap, Tap. Is this thing on?
So if you’ve read this blog at all in the past, you know that this topic pops up every so often. That would be because it’s difficult.
The context for this instance of the discussion is a Twitter thread started by Erik
I argue that what Erik called “by value” semantics, violates REST’s stateless constraint (and therefore self-description, as stateless is a sub-constraint). This is because the constraint is defined as;let's say your JSON-LD takes over the world. how does the context URI provider cope/scale? anybody thinking about non-#HTTP context URIs?
— Erik Wilde (@dret) February 9, 2016
[…] each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the serverAnd this leads us to look at a sample document, a JSON-LD document with an @context declaration as a message, and how we determine what that message means. Using the example from the front of the JSON-LD site;
{ "@context": "http://json-ld.org/contexts/person.jsonld", "@id": "http://dbpedia.org/resource/John_Lennon", "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" }… sending this to someone is intended to communicate a set of RDF triples, including this one, where the “name” string is supposed to expand to the full FOAF name property URI;
<http://dbpedia.org/resource/John_Lennon> <http://xmlns.com/foaf/0.1/name> “John Lennon” .So to even “understand” the request, we need to resolve the @context URI to receive additional information. Therefore, stateful, and also not self-descriptive. Another way to look at this is from an archivist POV. If I store that JSON-LD document away for 10 years, restore it, and try to understand it what it meant, can I? Obviously in this case you’d need for that resolved document to have not changed in ways which change the meaning of our JSON-LD document. For example, by not re-binding “name” to rdfs:label.
“Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information. Distributed hypermedia allows the presentation and control information to be stored at remote locations.”
https://www.ics.uci.edu/~fielding/pubs/dissertation/web_arch_domain.htm#sec_4_1_3
remote location does not *require* the representation to suddenly lack self-descriptiveness.
I’m going to make a couple of picky comments separately here, as I don’t want it them derail the main (IMO more substantial) discussion. A more detailed response is in preparation.
The reference to REST talks about completeness of information in a *request*. But the JSON-LD example given is a *response*.
My second picky comment is the reference to “cannot take advantage of any stored context on the server”. That doesn’t (to my mind) preclude stored context accessible (using information in the request) on the web.
#g
For me, stateless/statefull relates to previous interactions of the client with the server: a protocol is stateful if the server needs to keep track of previous request to understand the request. This is not the case here…
Granted, a message with a @context URI is not entirely self-described, but one could argue that an HTML document with links to stylesheet and images is not either… How is it different?
Finally, nothing prevents a JSON-LD producer to include the expanded context in the message rather than its URI, so indeed, for archiving purposes (or in a context where it is important to have strict self-description with no indirection), this should indeed be advised.
I was in danger of writing a long essay in response, which would be tedious for me to write and for you to read, but on reflection I think it boils down to two thoughts:
.
1. JSON-LD functions as a bridge between two aspects of data on the web: (a) as a hypermedia format (in the sense of HATEOAS), and (b) as a surface syntax for RDF. Use of JSON-LD context is, IME, primarily for the latter, and not needed for the former. YMMV, and I’d be interested to hear how.
.
2. I think there is over-broad interpretation of Fielding’s statement about REST constraints, which is very precise: it does _not_ say that every resource representation exchanged must be a complete description of what it describes. I’d argue that’s not possible.
#g
I like how you put understand in quotes there. To some extent to understand any RDF description you need to know how a particular vocabulary is defined, right? At least it is needed for inference, which is (supposedly) a core use case for RDF.
I think it’s kind of Ironic, and beautiful in a way, that documenting context is what archives spend a lot of time doing. For Web archives in particular I think your analysis applies to the <img> tag as well. To understand a text/html representation at a particular time it’s important, perhaps essential, to go fetch the src url in those images too. At least with <img> and @context these links out onto the Web are explicit, relatively easy to process, and not wrapped up in a bunch of minified JavaScript that needs to be blindly executed :-)
Anyway, thanks for dropping this observation of yours into a blog post. I think the performance implications for the Web server are potentially the biggest issue. But perhaps effective use of HTTP pipelining will help?
Ed, the IMG example is a very good one as understanding the meaning of what somebody is publishing could indeed depend on the content of an image. I recall Roy pointing this out in the past, in a legal context. And yes, that would be an example akin to @context.
I put “understand” in quotes and emphasized it to try to point out the need to think of it as a message or part thereof. We often forget that we are trying to *communicate* when sending and receiving these documents, and it is in this context that I, and I believe Roy, use “understand”.
I don’t agree with the comparison to vocabulary definition, because that’s context set via standardization. This would be akin to standardizing @context URIs (an idea better than the status quo, IMO).
Performance is indeed the main value in separating out this information, but is indicative of a trade-off; as performance improves, other properties suffer, in this case evolvability and scalability. I’m not saying this trade-off is always a bad idea, but I do believe that in the vast majority of cases where JSON-LD is used, it is.
I think you misunderstand the first quote. Where Roy Fielding writes “each request […] must contain all of the information necessary”, the term “all information” also includes the media type. When talking HTTP this would mean the Content-Type header. A client which can handle application/ld+json will understand how to deal with remote @context. Thus the message is still self-descriptive.
Graham, I’m sorry I don’t understand your point #1.
Re #2, you’re right that it’s not a goal of REST to send “complete descriptions”. REST’s focus is on self-descriptive messages, and that means that its messages need to describe only what the sender is trying to communicate. The John Lennon example is an attempt to highlight this; the sender is trying to communicate a specific triple, but in order for the recipient to be able to extract that triple, it needs additional information.
This old post expands on this topic; http://www.markbaker.ca/blog/2007/11/users-and-self-description/
I agree with Graham that we can never be complete. The statement “must contain all of the information necessary to understand the request” is necessarily open to interpretation, because what does “all information” mean? It depends on the capabilities of the client, and what we can reasonably expect a client to understand. Really including everything would include (and not be limited to): a description of the hypermedia format, a description of the protocol, etc…
The more interesting question is: what do we necessarily take as a base? What is a good minimum set of requirements for a client? (Personally, I’m very happy with the discussions of the Hydra group in this regard.)
In the concrete case of JSON-LD, the context lookup mechanism seems like an obvious part of the minimum requirements, already on the side of the media type.
Looks like we crossed paths there. Ruben, Tomasz, I encourage you both to read the post linked from my response to Graham as I believe it addresses your concerns.
Concerning statefulness, and the shopping cart example, I think that state can appear in many places. REST itself derives its name from “Representational State Transfer”. The individual HTTP transactions may be stateless, but that does’t mean the server is not maintaining some aspect of state.
I sometimes think its more helpful to think of REST as requiring that any referenced state is *explicit* in a request – e.g. through the use of requests containing URLs pointing to specific stateful resources.
@mark,
My point 1 was just intended to make the point that contexts are not necessary to use of JSON-LD as a hypermedia format for REST interfaces, to question @dret’s original suggestion that JSON-LD for such interfaces would lead to performance problems.
Graham, perhaps the important point here is that the type of state is not intrinsic to the state itself, but instead how that state is referenced. While a JSON-LD context document lives behind a URI and so represents the state of a resource, its referencing through a (back to @dret lingo here) by-value semantic means it’s also being used as application/session state.
Mark, I’m not sure what you mean here by a “by-value” semantic. In programming language terms, it seems to me more like a “by-reference” semantic, but that may well not be what you mean here.
Consider the difference between linking semantics in XLink vs. XInclude. I believe the origin is with Ted Nelson’s notion of transclusion.