Skip to main content

Compression Dictionary Transport
draft-meenan-httpbis-compression-dictionary-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Patrick Meenan , Yoav Weiss
Last updated 2023-06-30
Replaced by draft-ietf-httpbis-compression-dictionary
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-meenan-httpbis-compression-dictionary-00
HTTP                                                           P. Meenan
Internet-Draft                                                  Y. Weiss
Intended status: Informational                                Google LLC
Expires: 1 January 2024                                     30 June 2023

                    Compression Dictionary Transport
             draft-meenan-httpbis-compression-dictionary-00

Abstract

   This specification defines a mechanism for using designated [HTTP]
   responses as an external dictionary for future HTTP responses for
   compression schemes that support using external dictionaries (e.g.
   [Brotli] and [Zstandard]).

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://pmeenan.github.io/i-d-compression-dictionary/draft-meenan-
   httpbis-compression-dictionary.html.  Status information for this
   document may be found at https://datatracker.ietf.org/doc/draft-
   meenan-httpbis-compression-dictionary/.

   Discussion of this document takes place on the HTTP Working Group
   mailing list (mailto:[email protected]), which is archived at
   https://lists.w3.org/Archives/Public/ietf-http-wg/.

   Source for this draft and an issue tracker can be found at
   https://github.com/pmeenan/i-d-compression-dictionary.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

Meenan & Weiss           Expires 1 January 2024                 [Page 1]
Internet-Draft           compression-dictionary                June 2023

   This Internet-Draft will expire on 1 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Dictionary Negotiation  . . . . . . . . . . . . . . . . . . .   3
     2.1.  Use-As-Dictionary . . . . . . . . . . . . . . . . . . . .   3
       2.1.1.  match . . . . . . . . . . . . . . . . . . . . . . . .   3
       2.1.2.  ttl . . . . . . . . . . . . . . . . . . . . . . . . .   3
       2.1.3.  hashes  . . . . . . . . . . . . . . . . . . . . . . .   4
       2.1.4.  Examples  . . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Sec-Available-Dictionary  . . . . . . . . . . . . . . . .   5
       2.2.1.  Dictionary freshness requirement  . . . . . . . . . .   5
       2.2.2.  Dictionary URL matching . . . . . . . . . . . . . . .   5
       2.2.3.  Multiple matching dictionaries  . . . . . . . . . . .   6
   3.  Negotiating the compression algorithm . . . . . . . . . . . .   6
     3.1.  Accept-Encoding . . . . . . . . . . . . . . . . . . . . .   6
     3.2.  Content-Encoding  . . . . . . . . . . . . . . . . . . . .   7
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
     4.1.  Content Encoding  . . . . . . . . . . . . . . . . . . . .   7
     4.2.  Header Field Registration . . . . . . . . . . . . . . . .   7
   5.  Compatibility Considerations  . . . . . . . . . . . . . . . .   7
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
     6.1.  Changing content  . . . . . . . . . . . . . . . . . . . .   7
     6.2.  Reading content . . . . . . . . . . . . . . . . . . . . .   8
     6.3.  Security Mitigations  . . . . . . . . . . . . . . . . . .   8
       6.3.1.  Cross-origin protection . . . . . . . . . . . . . . .   8
       6.3.2.  Response readability  . . . . . . . . . . . . . . . .   8
   7.  Privacy Considerations  . . . . . . . . . . . . . . . . . . .   9
   8.  Informative References  . . . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

Meenan & Weiss           Expires 1 January 2024                 [Page 2]
Internet-Draft           compression-dictionary                June 2023

1.  Introduction

   This specification defines a mechanism for using designated [HTTP]
   responses as an external dictionary for future HTTP responses for
   compression schemes that support using external dictionaries (e.g.
   [Brotli] and [Zstandard]).

   This document describes the HTTP headers used for negotiating
   dictionary usage and registers media types for content encoding
   Brotli and Zstandard using a negotiated dictionary.

2.  Dictionary Negotiation

2.1.  Use-As-Dictionary

   When responding to a HTTP Request, a server can advertise that the
   response can be used as a dictionary for future requests for URLs
   that match the pattern specified in the Use-As-Dictionary response
   header.

   The Use-As-Dictionary response header is a Structured Field [RFC8941]
   sf-dictionary with values for "match", "ttl" and "hashes".

2.1.1.  match

   The "match" value of the Use-As-Dictionary header is a sf-string
   value that provides an URL-matching pattern for requests where the
   dictionary can be used.

   The sf-string is parsed as a URL [RFC3986], and supports absolute
   URLs as well as relative URLs.  When stored, any relative URLs MUST
   be expanded so that only absolute URL patterns are used for matching
   against requests.

   The match URL supports using * as a wildcard within the match string
   for pattern-matching multiple URLs.  URLs with a natural * in them
   are not directly supported unless they can rely on the behavior of *
   matching an arbitrary string.

   The "match" value is required and MUST be included in the Use-As-
   Dictionary sf-dictionary for the dictionary to be considered valid.

2.1.2.  ttl

   The "ttl" value of the Use-As-Dictionary header is a sf-integer value
   that provides the time in seconds that the dictionary is valid for
   (time to live).

Meenan & Weiss           Expires 1 January 2024                 [Page 3]
Internet-Draft           compression-dictionary                June 2023

   This is independent of the cache lifetime of the resource being used
   for the dictionary.  If the underlying resource is evicted from cache
   then it is also removed but this allows for setting an explicit time
   to live for use as a dictionary independent of the underlying
   resource in cache.  Expired resources can still be useful as
   dictionaries while they are in cache and can be used for fetching
   updates of the expired resource.  It can also be useful to
   artificially limit the life of a dictionary in cases where the
   dictionary is updated frequently, to limit the number of possible
   incoming dictionary values.

   The "ttl" value is optional and defaults to 31536000 (1 year).

2.1.3.  hashes

   The "hashes" value of the Use-As-Dictionary header is a inner-list
   value that provides a list of supported hash algorithms in order of
   server preference.

   The dictionaries are identified by the hash of their contents and
   this value allows for negotiation of the algorithm to use.

   The "hashes" value is optional and defaults to (sha-256).

2.1.4.  Examples

2.1.4.1.  Path Prefix

   A response that contained a response header:

   Use-As-Dictionary: match="/product/*", ttl=604800, hashes=(sha-256
   sha-512)

   Would specify matching any URL with a path prefix of /product/ on the
   same [Origin] as the original request, expiring as a dictionary in 7
   days independent of the cache lifetime of the resource, and advertise
   support for both sha-256 and sha-512 hash algorithms.

2.1.4.2.  Versioned Directories

   A response that contained a response header:

   Use-As-Dictionary: match="/app/*/main.js"

   Would match main.js in any directory under /app/, expiring as a
   dictionary in one year and support using the sha-256 hash algorithm.

Meenan & Weiss           Expires 1 January 2024                 [Page 4]
Internet-Draft           compression-dictionary                June 2023

2.2.  Sec-Available-Dictionary

   When a HTTP client makes a request for a resource for which it has an
   appropriate dictionary, it can add a "Sec-Available-Dictionary"
   request header to the request to indicate to the server that it has a
   dictionary available to use for compression.

   The "Sec-Available-Dictionary" request header is a Structured Field
   [RFC8941] sf-string value that contains the hash of the contents of a
   single available dictionary calculated using one of the algorithms
   advertised as being supported by the server.

   The client MUST only send a single "Sec-Available-Dictionary" request
   header with a single hash value for the best available match that it
   has available.

2.2.1.  Dictionary freshness requirement

   To be considered as a match, the dictionary must not yet be expired
   as a dictionary.  When iterating through dictionaries looking for a
   match, the expiration time of the dictionary is calculated by taking
   the last time the dictionary was written and adding the "ttl" seconds
   from the "Use-As-Dictionary" response.  If the current time is beyond
   the expiration time of the dictionary, it MUST be ignored.

2.2.2.  Dictionary URL matching

   When a dictionary is stored as a result of a "Use-As-Dictionary"
   directive, it includes a "match" string with the URL pattern of
   request URLs that the dictionary can be used for.

   When comparing request URLs to the available dictionary match
   patterns, the comparison should account for the * wildcard when
   matching against request URLs.  This can be accomplished with the
   following algorithm which returns TRUE for a successful match and
   FALSE for no-match:

   1.  Let MATCH represent the absolute URL pattern from the "match"
       value for the given dictionary.

   2.  LET URL represent the request URL being checked.

   3.  If there are no * characters in MATCH: a.  If the MATCH and URL
       strings are identical, return TRUE.  b.  Else, return FALSE.

   4.  If there is a single * character in MATCH and it is at the end of
       the string: a.  If the MATCH string is identical to the start of
       the URL string, return TRUE.  b.  Else, return FALSE.

Meenan & Weiss           Expires 1 January 2024                 [Page 5]
Internet-Draft           compression-dictionary                June 2023

   5.  Split the MATCH string by the * character into an array of
       MATCHES (excluding the * deliminator from the individual
       entries).

   6.  Pop the first entry in MATCHES from the front of the array into
       PATTERN.  a.  If PATTERN is identical to the start of the URL
       string, remove the beginning of the URL string until the end of
       the match to PATTERN.  b.  Else, return FALSE.

   7.  If there is not a * character at the end of MATCH: a.  Pop the
       last entry in MATCHES from the end of the array into PATTERN.  b.
       If PATTERN is identical to the end of the URL string, remove the
       end of the URL string to the beginning of the match to PATTERN.
       c.  Else, return FALSE.

   8.  Pop each entry off of the front of the MATCHES array into
       PATTERN.  For each PATTERN, in order: a.  Search for PATTERN in
       URL from the beginning of URL and stop at the first match.  b.
       If no match is found, return FALSE.  c.  Remove the beginning of
       the URL string until the end of the match to the first occurrence
       of PATTERN.

   9.  Return TRUE.

2.2.3.  Multiple matching dictionaries

   When there are multiple dictionaries that match a given request URL,
   the client MUST pick the dictionary with the longest match pattern
   string length.

3.  Negotiating the compression algorithm

   When a compression dictionary is available for use for a given
   request, the algorithm to be used is negotiated through the regular
   mechanism for negotiating content encoding in HTTP.

   This document introduces two new content encoding algorithms:

   "br-d" - Brotli using an external compression dictionary. "zstd-d" -
   Zstandard using an external compression dictionary.

   The dictionary to use is negotiated separately and advertised in the
   "Sec-Available-Dictionary" request header.

3.1.  Accept-Encoding

   The client adds the algorithms that it supports to the "Accept-
   Encoding" request header. e.g.:

Meenan & Weiss           Expires 1 January 2024                 [Page 6]
Internet-Draft           compression-dictionary                June 2023

   Accept-Encoding: gzip, deflate, br, zstd, br-d, zstd-d

3.2.  Content-Encoding

   If a server supports one of the dictionary algorithms advertised by
   the client and chooses to compress the content of the response using
   the dictionary that the client has advertised then it sets the
   "Content-Encoding" response header to the appropriate value for the
   algorithm selected. e.g.:

   Content-Encoding: br-d

4.  IANA Considerations

4.1.  Content Encoding

   IANA will add the following entries to the "HTTP Content Coding
   Registry" within the "Hypertext Transfer Protocol (HTTP) Parameters"
   registry:

   Name: sbr Description: A stream of bytes compressed using the Brotli
   protocol with an external dictionary

   Name: szstd Description: A stream of bytes compressed using the
   Zstandard protocol with an external dictionary

4.2.  Header Field Registration

5.  Compatibility Considerations

   To minimize the risk of middle-boxes incorrectly processing
   dictionary-compressed responses, compression dictionary transport
   MUST only be used in secure contexts (HTTPS).

6.  Security Considerations

   The security considerations for [Brotli] and [Zstandard] apply to the
   dictionary-based versions of the respective algorithms.

6.1.  Changing content

   The dictionary must be treated with the same security precautions as
   the content, because a change to the dictionary can result in a
   change to the decompressed content.

Meenan & Weiss           Expires 1 January 2024                 [Page 7]
Internet-Draft           compression-dictionary                June 2023

6.2.  Reading content

   The CRIME attack shows that it's a bad idea to compress data from
   mixed (e.g. public and private) sources -- the data sources include
   not only the compressed data but also the dictionaries.  For example,
   if you compress secret cookies using a public-data-only dictionary,
   you still leak information about the cookies.

   Not only can the dictionary reveal information about the compressed
   data, but vice versa, data compressed with the dictionary can reveal
   the contents of the dictionary when an adversary can control parts of
   data to compress and see the compressed size.  On the other hand, if
   the adversary can control the dictionary, the adversary can learn
   information about the compressed data.

6.3.  Security Mitigations

   If any of the mitigations do not pass, the client MUST drop the
   response and return an error.

6.3.1.  Cross-origin protection

   To make sure that a dictionary can only impact content from the same
   origin where the dictionary was served, the "match" pattern used for
   matching a dictionary to requests MUST be for the same origin that
   the dictionary is served from.

6.3.2.  Response readability

   For clients, like web browsers, that provide additional protection
   against the readability of the payload of a response and against user
   tracking, additional protections MUST be taken to make sure that the
   use of dictionary-based compression does not reveal information that
   would not otherwise be available.

   In these cases, dictionary compression MUST only be used when both
   the dictionary and the compressed response are fully readable by the
   client.

   In browser terms, that means that both are either same-origin to the
   context they are being fetched from or that both include an "Access-
   Control-Allow-Origin" response header that matches the "Origin"
   request header they are fetched from.

Meenan & Weiss           Expires 1 January 2024                 [Page 8]
Internet-Draft           compression-dictionary                June 2023

7.  Privacy Considerations

   Since dictionaries are advertised in future requests using the hash
   of the content of the dictionary, it is possible to abuse the
   dictionary to turn it into a tracking cookie.

   To mitigate any additional tracking concerns, clients MUST treat
   dictionaries in the same way that they treat cookies.  This includes
   partitioning the storage as cookies are partitioned as well as
   clearing the dictionaries whenever cookies are cleared.

8.  Informative References

   [Brotli]   Alakuijala, J. and Z. Szabadka, "Brotli Compressed Data
              Format", RFC 7932, DOI 10.17487/RFC7932, July 2016,
              <https://www.rfc-editor.org/rfc/rfc7932>.

   [HTTP]     Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Message Syntax and Routing",
              RFC 7230, DOI 10.17487/RFC7230, June 2014,
              <https://www.rfc-editor.org/rfc/rfc7230>.

   [Origin]   Barth, A., "The Web Origin Concept", RFC 6454,
              DOI 10.17487/RFC6454, December 2011,
              <https://www.rfc-editor.org/rfc/rfc6454>.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, DOI 10.17487/RFC3986, January 2005,
              <https://www.rfc-editor.org/rfc/rfc3986>.

   [RFC8941]  Nottingham, M. and P. Kamp, "Structured Field Values for
              HTTP", RFC 8941, DOI 10.17487/RFC8941, February 2021,
              <https://www.rfc-editor.org/rfc/rfc8941>.

   [Zstandard]
              Collet, Y. and M. Kucherawy, Ed., "Zstandard Compression
              and the 'application/zstd' Media Type", RFC 8878,
              DOI 10.17487/RFC8878, February 2021,
              <https://www.rfc-editor.org/rfc/rfc8878>.

Authors' Addresses

   Patrick Meenan
   Google LLC
   Email: [email protected]

Meenan & Weiss           Expires 1 January 2024                 [Page 9]
Internet-Draft           compression-dictionary                June 2023

   Yoav Weiss
   Google LLC
   Email: [email protected]

Meenan & Weiss           Expires 1 January 2024                [Page 10]