Copyright � 2003-2004 W3C� (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
Web architecture depends on applications having a shared understanding of the messages exchanged between agents (for example, clients, servers, and intermediaries) and a shared expectation of how the payload of a message -- a representation -- will be interpreted by the recipient. The Web architecture uses representation metadata, when supported by the communication protocol, to indicate the sender's intentions to the recipient. In particular, dispatching and security-related decisions regarding the processing of a message are often based on values provided in representation metadata fields, such as the "Content-Type" field of HTTP and MIME. In this finding, we review the architectural design choice that metadata provided by a sender be authoritative. We also examine why recipient behavior that misrepresents information provided by the sender can be harmful if it is done without consent from the user. Finally, we consider how specification authors should incorporate these points into their work.
This document has been developed for discussion by the W3C Technical Architecture Group. This finding addresses issue contentTypeOverride-24 and partly addresses issue errorHandling-20. The TAG finding "Internet Media Type registration, consistency of use" also includes material related to this issue.
At their 23 February 2004 teleconference, the TAG approved this finding. Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC2119].
Please send comments on this finding to the publicly archived TAG mailing list [email protected] (archive).
1 Summary of Key Points
2 Scenarios
����2.1 Scenario 1: Silent recovery from error
����2.2 Scenario 2: Server misconfiguration
����2.3 Scenario 3: Metadata hints in specifications
3 Why the sender is the
authoritative source of representation metadata
����3.1 The Role of Metadata
����3.2 Sources of Metadata
����3.3 Risks Associated with Overriding Authoritative Metadata
4 Inconsistency between representation data and metadata
����4.1 Recipient Handling of Inconsistency
����4.2 Self-describing data and Risk of Inconsistency
����4.3 Reducing the Risk of Inconsistency
5 Metadata Hints in Specifications
6 Future Work
7 References
8 Acknowledgments
The following are the key architectural points of this finding:
This finding addresses the following issues that are raised by these architectural points:
The scenarios in this section illustrate some issues that arise when the architectural points described in this finding are ignored. The remainder of the finding examines these issues in more detail.
Stuart runs his own Web server at "http://www.example.org/". He
creates an HTML page and means to serve it as "text/html", but
misconfigures the Web server so that the content is served via
HTTP/1.1 [RFC2616] as "text/plain". Tim's browser looks
inside the page, detects some markup that suggests that this is an
HTML document (e.g., a <!DOCTYPE
declaration or
<title>
element), and, without informing Tim,
proceeds as though the content were "text/html", rendering it
according to the HTML and CSS specifications. Janet's browser displays
the content as plain text.
Which party has neglected a principle of Web architecture: Stuart for the server misconfiguration, Tim's browser for silently overriding the HTTP headers from the server, or Janet's browser for not detecting that the content looked like HTML?
Answer: By silently overriding metadata from the representation provider in the HTTP headers, Tim's browser did not respect Web architecture principles that promote shared understanding and security.
Norm publishes an XHTML document that includes this link:
<link href="cool-style" type="text/css" rel="stylesheet"/>
Although the link refers to an XSLT style sheet, Norm has set the
type
attribute to "text/css". Stuart has configured the
Web server so that the style sheet is served via HTTP/1.1 as
"application/xslt+xml". With a user agent that understands XSLT but
not CSS, Janet requests the content that includes this link. As it
interprets the representation data, Janet's user agent reads the
type
hint and does not fetch the style sheet."
Which party is responsible for the fact that Janet did not receive content she should have: Stuart for the server configuration, Norm for stating that the style sheet is served as "text/css" when in fact it's served with a different media type, or Janet's user agent for not double-checking the media type with the server?
Answer: Though not a violation of principles of Web architecture, Norm's mislabeling of content deprived Janet of content she should have received.
The MyFormat specification specifies a type
attribute
and that, when type
is present, a receiving agent must
use its value and ignore conflicting metadata provided by the
sender. The MyFormat specification designers explain that such a
definition of the type
attribute allows content authors
to work around misconfigured servers. They contend that this is
necessary because in many environments content authors do not have
sufficient access to server managers to affect server
configuration.
Should the MyFormat specification designers ignore a principle of
Web architecture or define type
this way to remedy this
social problem?
Answer: The TAG does not believe that author-specified overrides in representation data offer the proper solution to social problems such as interactions with server managers. An agent that silently overrides server-provided metadata can create security risks and prevent errors from being detected and corrected.
Successful communication between two parties using a piece of information relies on shared understanding. In the Web architecture, agents identify resources with URIs and they communicate state information for these resources. Below we examine the role of metadata in that exchange and make the case for the sender as the authoritative source of metadata.
The sequence of numbers "324033" might be a license plate number in the state of Arkansas or an old-style telephone number in Italy. In general, one cannot determine the nature of data in the absence of context. One way to provide a context for interpretation is through metadata. On the Web, examples of important metadata include the Internet media type, which explains what data format specification can be used to interpret the data, and the character encoding, which explains how octets map to characters.
In The Web architecture, data and metadata are distinguished during an exchange between agents (in the protocol used to carry out the exchange). Agents exchange resource state information through a "representation," which consists of two parts:
Not only does this separation promote shared understanding, it enables more efficient processing. It is far easier to dispatch behavior on the basis of inspecting metadata (typically short strings) than it is to invoke a generic document parser and try to divine the purpose of data by inspecting the data itself (with no guarantee of success). Separating data from metadata also increases the Web's flexibility as data formats rise, evolve, and fall.
One can imagine different (competing) sources of metadata:
To enable the greatest number of independent agents to interpret representation data in a consistent manner (i.e., according to a common set of specifications), the Web architecture adopts the first choice: representation metadata, when provided by the sender of a representation, is authoritative in defining the nature of the representation being sent.
Thus, if the sender asserts through a protocol that "the following representation data has the Internet media type text/html", that assertion is authoritative. The IANA media type registry maps these short strings such as "text/html" or "image/png" to data format specifications (e.g., XHTML, CSS, PNG, XLink, RDF/XML, etc.) via intermediate media type registrations. For instance, in the IANA registry, the content type "text/html" is associated with [RFC2854], which in turn states that:
The Internet media type asserts "this is X", not "process this as follows." Representation metadata does not constrain the receiving agent to process the representation data in one particular way. It does allow the designer of the receiving agent to create applications that correctly interpret the sender's intentions, while taking into account the desires of the party employing the agent (e.g., expressed through configuration and user choices).
A user agent represents the user for protocol-level interactions with representation providers. A user agent that does not respect protocol specifications can violate user privacy, produce security holes, and otherwise create confusion. For example, a user agent can create a security problem by ignoring a "Content-Type" header with value "text/plain", guessing that representation data is a shell script, and executing the script on the user's machine without the user's knowledge. Because of such risks, it is an error for an agent to ignore or override authoritative metadata without the consent of the party employing the agent. For this reason, the HTTP/1.1 specification states, "If and only if the media type is not given by a "Content-Type" field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource."
In scenario 1, in terms of Web architecture, Stuart is innocent; misconfiguration of the server is not an architectural error, it's just a human error. Instead, Tim's browser is the culprit since it misrepresents the resource provider by ignoring the authoritative metadata, without Tim's consent. Janet's browser respected the "Content-Type" header field, and by doing so, helps Janet and Stuart detect a server misconfiguration.
Note the difference between an agent taking authoritative metadata into account and an agent ignoring the metadata without the consent of the user. The first scenario below is an error, the second is not:
Although there are benefits to separating representation metadata from data, there are risks as well. In particular, the representation provider may create inconsistencies by misassigning metadata. Inconsistency between representation data and metadata is an error. Examples of inconsistencies between headers and representation data that have been observed on the Web include:
Recipients SHOULD detect inconsistencies between representation data and metadata but MUST NOT resolve them without the consent of the user.
Consent does not necessarily imply that the receiving agent must interrupt the user and require selection of one option or another. User consent may be achieved in the form of pre-selected configuration options, modes, or selectable user interface toggles, with appropriate reporting to the user when the agent detects an error. For example, a small "bug" icon in a graphical browser's user interface can indicate that the user agent has overridden sender metadata and can also act as a button through which a curious user might inspect the error or reverse the agent's choice. Other agent behavior when faced with inconsistencies includes prompting the user for interactive guidance and advancing according to an ordered list of user preferences. Naturally, appropriate behavior and interfaces are unique to each type of receiving agent and application context. It is beyond the scope of this finding to anticipate the range of possible errors and ways in which interface designers might obtain user feedback to address them.
In Scenario 2, Norm is
responsible for Janet not having access to representation data she was
meant to receive. The HTML 4.01 Recommendation states that
"Authors who use [the type
] attribute take
responsibility to manage the risk that it may become inconsistent with
the content available at the link target address." Janet's
client could have done more than merely read the type
hint and decide to skip the style sheet" Users benefit from clients
that allow different configurations for handling hints, including:
Data is "self-describing" if it includes enough information to allow two parties to establish a consistent interpretation without additional clues. If the representation provider intends for the data to be interpreted in a manner other than what is self-described (e.g., "treat this XML content as plain text"), then clarifying metadata is required (e.g., in protocol headers). As illustrated above, providing redundant metadata for data that is self-describing can lead to inconsistencies.
Representation providers SHOULD NOT in general specify the character encoding for XML data in protocol headers since the data is self-describing.
Representation providers can help reduce the risk of inconsistency through careful assignment of representation metadata (especially that which applies across representations). In particular:
Some format specifications allow content authors to provide
metadata hints for servers and clients. For instance, the
http-equiv
attribute of the HTML meta
element was intended for servers (not clients). In HTML 2.0 [RFC1866], section 5.2.5, the attribute is specified as
follows:
The HTML 4.01 attribute type
for the link
element (used in Scenario 2) gives
clients a hint about what the media type of a representation
of the linked resource is likely to be.
A format specification that includes metadata hints for clients should make clear that when these hints interact with server metadata, they are advisory only. Format specifications SHOULD NOT include requirements for clients to override server metadata without user consent; this is the error of Scenario 3. An architecturally sound description of an advisory attribute might read:
Section 2.2.2 of the W3C Proposed Recommendation Speech Recognition Grammar Specification Version 1.0 [SRGS10] describes agent behavior that is consistent with this finding.
The W3C Recommendation SMIL 2.0 [SMIL20] is
inconsistent with the current finding in this regard since the
definition of the type
attribute (section
7.3.1) specifies circumstances in which type
is
supposed to take precedence over server metadata.
Roy Fielding and Stuart Williams scrutinized this finding and provided substantial amounts of text. Dan Connolly generously provided valuable input as well. Many thanks to Martin D�rst, Philipp Hoschka, Rob Lanphier, and Norm Walsh for their reviews. Thanks to all reviewers for their contributions to this finding.