Internet-Draft | compression-dictionary | August 2023 |
Meenan & Weiss | Expires 10 February 2024 | [Page] |
- Workgroup:
- HTTP
- Internet-Draft:
- draft-meenan-httpbis-compression-dictionary-05
- Published:
- Intended Status:
- Informational
- Expires:
Compression Dictionary Transport
Abstract
This specification defines a mechanism for using designated [HTTP] responses as an external dictionary for future HTTP responses for compression schemes that support using external dictionaries (e.g. Brotli [RFC7932] and Zstandard [RFC8878]).¶
About This Document
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://pmeenan.github.io/i-d-compression-dictionary/draft-meenan-httpbis-compression-dictionary.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-meenan-httpbis-compression-dictionary/.¶
Discussion of this document takes place on the HTTP Working Group mailing list (mailto:[email protected]), which is archived at https://lists.w3.org/Archives/Public/ietf-http-wg/.¶
Source for this draft and an issue tracker can be found at https://github.com/pmeenan/i-d-compression-dictionary.¶
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 10 February 2024.¶
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
1. Introduction
This specification defines a mechanism for using designated [HTTP] responses as an external dictionary for future HTTP responses for compression schemes that support using external dictionaries (e.g. Brotli [RFC7932] and Zstandard [RFC8878]).¶
This document describes the HTTP headers used for negotiating dictionary usage and registers media types for content encoding Brotli and Zstandard using a negotiated dictionary.¶
This document uses the line folding strategies described in [FOLDING].¶
2. Dictionary Negotiation
2.1. Use-As-Dictionary
When responding to a HTTP Request, a server can advertise that the response can be used as a dictionary for future requests for URLs that match the pattern specified in the Use-As-Dictionary response header.¶
The Use-As-Dictionary response header is a Structured Field [RFC8941] sf-dictionary with values for "match", "ttl", "type" and "hashes".¶
2.1.1. match
The "match" value of the Use-As-Dictionary header is a sf-string value that provides an URL-matching pattern for requests where the dictionary can be used.¶
The sf-string is parsed as a URL [RFC3986], and supports absolute URLs as well as relative URLs. When stored, any relative URLs MUST be expanded so that only absolute URL patterns are used for matching against requests.¶
The match URL supports using * as a wildcard within the match string for pattern-matching multiple URLs. URLs with a natural * in them are not directly supported unless they can rely on the behavior of * matching an arbitrary string.¶
The [Origin] of the URL in the "match" pattern MUST be the same as the origin of the request that specifies the "Use-As-Dictionary" response and MUST not include a * wildcard.¶
The "match" value is required and MUST be included in the Use-As-Dictionary sf-dictionary for the dictionary to be considered valid.¶
2.1.2. ttl
The "ttl" value of the Use-As-Dictionary header is a sf-integer value that provides the time in seconds that the dictionary is valid for (time to live).¶
The "ttl" is independent of the cache lifetime of the resource being used for the dictionary. If the underlying resource is evicted from cache then it is also removed but this allows for setting an explicit time to live for use as a dictionary independent of the underlying resource in cache. Expired resources can still be useful as dictionaries while they are in cache and can be used for fetching updates of the expired resource. It can also be useful to artificially limit the life of a dictionary in cases where the dictionary is updated frequently which can help limit the number of possible incoming dictionary variations.¶
The "ttl" value is optional and defaults to 31536000 (1 year).¶
2.1.3. type
The "type" value of the Use-As-Dictionary header is a sf-string value that describes the file format of the supplied dictionary.¶
"raw" is the only defined dictionary format which represents an unformatted blob of bytes suitable for any compression scheme to use.¶
If a client receives a dictionary with a type that it does not understand, it MUST NOT use the dictionary.¶
The "type" value is optional and defaults to "raw".¶
2.1.4. hashes
The "hashes" value of the Use-As-Dictionary header is a inner-list value that provides a list of supported hash algorithms in order of server preference.¶
The dictionaries are identified by the hash of their contents and this value allows for negotiation of the algorithm to use.¶
The "hashes" value is optional and defaults to (sha-256).¶
2.1.5. Examples
2.1.5.1. Path Prefix
A response that contained a response header:¶
Would specify matching any URL with a path prefix of /product/ on the same [Origin] as the original request, expiring as a dictionary in 7 days independent of the cache lifetime of the resource, and advertise support for both sha-256 and sha-512 hash algorithms.¶
2.1.5.2. Versioned Directories
A response that contained a response header:¶
Would match main.js in any directory under /app/, expiring as a dictionary in one year and support using the sha-256 hash algorithm.¶
2.2. Sec-Available-Dictionary
When a HTTP client makes a request for a resource for which it has an appropriate dictionary, it can add a "Sec-Available-Dictionary" request header to the request to indicate to the server that it has a dictionary available to use for compression.¶
The "Sec-Available-Dictionary" request header is a lowercase Base16-encoded [RFC4648] hash of the contents of a single available dictionary calculated using one of the algorithms advertised as being supported by the server.¶
Its syntax is defined by the following [ABNF]:¶
Sec-Available-Dictionary = hvalue hvalue = 1*hchar hchar = DIGIT / "a" / "b" / "c" / "d" / "e" / "f"¶
The client MUST only send a single "Sec-Available-Dictionary" request header with a single hash value for the best available match that it has available.¶
For example:¶
2.2.1. Dictionary freshness requirement
To be considered as a match, the dictionary must not yet be expired as a dictionary. When iterating through dictionaries looking for a match, the expiration time of the dictionary is calculated by taking the last time the dictionary was written and adding the "ttl" seconds from the "Use-As-Dictionary" response. If the current time is beyond the expiration time of the dictionary, it MUST be ignored.¶
2.2.2. Dictionary URL matching
When a dictionary is stored as a result of a "Use-As-Dictionary" directive, it includes a "match" string with the URL pattern of request URLs that the dictionary can be used for.¶
When comparing request URLs to the available dictionary match patterns, the comparison should account for the * wildcard when matching against request URLs. This can be accomplished with the following algorithm which returns TRUE for a successful match and FALSE for no-match:¶
- Let MATCH represent the absolute URL pattern from the "match" value for the given dictionary.¶
- LET URL represent the request URL being checked.¶
-
If there are no * characters in MATCH:¶
-
If there is a single * character in MATCH and it is at the end of the string:¶
- Split the MATCH string by the * character into an array of MATCHES (excluding the * deliminator from the individual entries).¶
-
If there is not a * character at the end of MATCH:¶
-
Pop the first entry in MATCHES from the front of the array into PATTERN.¶
- If PATTERN is not identical to the start of the URL string, return FALSE.¶
-
Pop each entry off of the front of the MATCHES array into PATTERN. For each PATTERN, in order:¶
- Return TRUE.¶
2.2.3. Multiple matching dictionaries
When there are multiple dictionaries that match a given request URL, the client MUST pick the dictionary with the longest match pattern string length.¶
3. Negotiating the compression algorithm
When a compression dictionary is available for use for a given request, the algorithm to be used is negotiated through the regular mechanism for negotiating content encoding in HTTP.¶
This document introduces two new content encoding algorithms:¶
Content-Encoding | Description |
---|---|
br-d | Brotli using an external compression dictionary |
zstd-d | Zstandard using an external compression dictionary |
The dictionary to use is negotiated separately and advertised in the "Sec-Available-Dictionary" request header.¶
3.1. Accept-Encoding
The client adds the algorithms that it supports to the "Accept-Encoding" request header. e.g.:¶
3.2. Content-Encoding
If a server supports one of the dictionary algorithms advertised by the client and chooses to compress the content of the response using the dictionary that the client has advertised then it sets the "Content-Encoding" response header to the appropriate value for the algorithm selected. e.g.:¶
If the response is cacheable, it MUST include a "Vary" header to prevent caches serving dictionary-compressed resources to clients that don't support them or serving the response compressed with the wrong dictionary:¶
4. IANA Considerations
4.1. Content Encoding
IANA is asked to update the "HTTP Content Coding Registry" registry ([RFC9110]) according to the table below:¶
Name | Description | Reference |
---|---|---|
br-d | A stream of bytes compressed using the Brotli protocol with an external dictionary | [RFC7932] |
zstd-d | A stream of bytes compressed using the Zstandard protocol with an external dictionary | [RFC8878] |
4.2. Header Field Registration
IANA is asked to update the "Hypertext Transfer Protocol (HTTP) Field Name Registry" registry ([RFC9110]) according to the table below:¶
Field Name | Status | Reference |
---|---|---|
Use-As-Dictionary | permanent | Section 2.1 of this document |
Sec-Available-Dictionary | permanent | Section 2.2 of this document |
5. Compatibility Considerations
To minimize the risk of middle-boxes incorrectly processing dictionary-compressed responses, compression dictionary transport MUST only be used in secure contexts (HTTPS).¶
6. Security Considerations
The security considerations for Brotli [RFC7932] and Zstandard [RFC8878] apply to the dictionary-based versions of the respective algorithms.¶
6.1. Changing content
The dictionary must be treated with the same security precautions as the content, because a change to the dictionary can result in a change to the decompressed content.¶
6.2. Reading content
The CRIME attack shows that it's a bad idea to compress data from mixed (e.g. public and private) sources -- the data sources include not only the compressed data but also the dictionaries. For example, if you compress secret cookies using a public-data-only dictionary, you still leak information about the cookies.¶
Not only can the dictionary reveal information about the compressed data, but vice versa, data compressed with the dictionary can reveal the contents of the dictionary when an adversary can control parts of data to compress and see the compressed size. On the other hand, if the adversary can control the dictionary, the adversary can learn information about the compressed data.¶
6.3. Security Mitigations
If any of the mitigations do not pass, the client MUST drop the response and return an error.¶
6.3.1. Cross-origin protection
To make sure that a dictionary can only impact content from the same origin where the dictionary was served, the "match" pattern used for matching a dictionary to requests MUST be for the same origin that the dictionary is served from.¶
6.3.2. Response readability
For clients, like web browsers, that provide additional protection against the readability of the payload of a response and against user tracking, additional protections MUST be taken to make sure that the use of dictionary-based compression does not reveal information that would not otherwise be available.¶
In these cases, dictionary compression MUST only be used when both the dictionary and the compressed response are fully readable by the client.¶
In browser terms, that means that both are either same-origin to the context they are being fetched from or that the response is cross-origin and passes the CORS check (https://fetch.spec.whatwg.org/#cors-check).¶
6.3.2.1. Same-Origin
On the client-side, same-origin determination is defined in the fetch spec (https://html.spec.whatwg.org/multipage/browsers.html#origin).¶
On the server-side, a request with a "Sec-Fetch-Site:" request header with a value of "same-origin" is to be considered a same-origin request.¶
6.3.2.2. Cross-Origin
For requests that are not same-origin (Section 6.3.2.1), the "mode" of the request can be used to determine the readability of the response.¶
For clients that conform to the fetch spec, the mode of the request is stored in the RequestMode attribute of the request (https://fetch.spec.whatwg.org/#requestmode).¶
For servers responding to clients that expose the request mode information, the value of the mode is sent in the "Sec-Fetch-Mode" request header.¶
If a "Sec-Fetch-Mode" request header is not present, the server SHOULD allow for the dictionary compression to be used.¶
-
If the mode is "navigate" or "same-origin":¶
-
If the mode is "cors":¶
-
For clients, apply the CORS check from the fetch spec (https://fetch.spec.whatwg.org/#cors-check) which includes credentials checking restrictions that may not be possible to check on the server.¶
-
For servers:¶
-
If the response does not include an "Access-Control-Allow-Origin" response header:¶
-
If the request does not include an "Origin" request header:¶
-
If the value of the "Access-Control-Allow-Origin" response header is "*":¶
-
If the value of the "Access-Control-Allow-Origin" response header matches the value of the "Origin" request header:¶
-
-
-
If the mode is any other value (including "no-cors"):¶
7. Privacy Considerations
Since dictionaries are advertised in future requests using the hash of the content of the dictionary, it is possible to abuse the dictionary to turn it into a tracking cookie.¶
To mitigate any additional tracking concerns, clients MUST treat dictionaries in the same way that they treat cookies. This includes partitioning the storage as cookies are partitioned as well as clearing the dictionaries whenever cookies are cleared.¶
8. References
8.1. Normative References
- [FOLDING]
- Watsen, K., Auerswald, E., Farrel, A., and Q. Wu, "Handling Long Lines in Content of Internet-Drafts and RFCs", RFC 8792, DOI 10.17487/RFC8792, , <https://www.rfc-editor.org/rfc/rfc8792>.
- [RFC9110]
- Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, Ed., "HTTP Semantics", STD 97, RFC 9110, DOI 10.17487/RFC9110, , <https://www.rfc-editor.org/rfc/rfc9110>.
8.2. Informative References
- [ABNF]
- Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, , <https://www.rfc-editor.org/rfc/rfc5234>.
- [HTTP]
- Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing", RFC 7230, DOI 10.17487/RFC7230, , <https://www.rfc-editor.org/rfc/rfc7230>.
- [Origin]
- Barth, A., "The Web Origin Concept", RFC 6454, DOI 10.17487/RFC6454, , <https://www.rfc-editor.org/rfc/rfc6454>.
- [RFC3986]
- Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, , <https://www.rfc-editor.org/rfc/rfc3986>.
- [RFC4648]
- Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, , <https://www.rfc-editor.org/rfc/rfc4648>.
- [RFC7932]
- Alakuijala, J. and Z. Szabadka, "Brotli Compressed Data Format", RFC 7932, DOI 10.17487/RFC7932, , <https://www.rfc-editor.org/rfc/rfc7932>.
- [RFC8878]
- Collet, Y. and M. Kucherawy, Ed., "Zstandard Compression and the 'application/zstd' Media Type", RFC 8878, DOI 10.17487/RFC8878, , <https://www.rfc-editor.org/rfc/rfc8878>.
- [RFC8941]
- Nottingham, M. and P. Kamp, "Structured Field Values for HTTP", RFC 8941, DOI 10.17487/RFC8941, , <https://www.rfc-editor.org/rfc/rfc8941>.