Cache Digests for HTTP/2DeNA Co, Ltd.kazuhooku@gmail.commnot@mnot.nethttps://www.mnot.net/
General
Internet-DraftThis specification defines a HTTP/2 frame type to allow clients to inform the server of their
cache’s contents. Servers can then use this to inform their choices of what to push to clients.Discussion of this draft takes place on the HTTP working group mailing list
(ietf-http-wg@w3.org), which is archived at https://lists.w3.org/Archives/Public/ietf-http-wg/.Working Group information can be found at http://httpwg.github.io/; source
code and issues list for this draft can be found at https://github.com/httpwg/http-extensions/labels/cache-digest.HTTP/2 allows a server to “push” synthetic request/response pairs into a client’s cache
optimistically. While there is strong interest in using this facility to improve perceived Web
browsing performance, it is sometimes counterproductive because the client might already have
cached the “pushed” response.When this is the case, the bandwidth used to “push” the response is effectively wasted, and
represents opportunity cost, because it could be used by other, more relevant responses. HTTP/2
allows a stream to be cancelled by a client using a RST_STREAM frame in this situation, but there
is still at least one round trip of potentially wasted capacity even then.This specification defines a HTTP/2 frame type to allow clients to inform the server of their
cache’s contents using a Golumb-Rice Coded Set . Servers can then use this to inform their
choices of what to push to clients.The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”,
“RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in
.The CACHE_DIGEST frame type is 0xf1. NOTE: This is an experimental value; if standardised, a
permanent value will be assigned.The CACHE_DIGEST frame payload has the following fields:
An unsigned, 16-bit integer indicating the length, in octets, of the Origin field.
A sequence of characters containing the ASCII serialization of an origin (, Section 6.2) that the Digest-Value applies to.
A sequence of octets containing the digest as computed in .The CACHE_DIGEST frame defines the following flags:RESET (0x1): When set, indicates that any and all cache digests for the applicable origin held by the recipient MUST be considered invalid.COMPLETE (0x2): When set, indicates that the currently valid set of cache digests held by the server constitutes a complete representation of the cache’s state regarding that origin, for the type of cached response indicated by the STALE flag.VALIDATORS (0x4): When set, indicates that the validators boolean in is true.STALE (0x8): When set, indicates that all cached responses represented in the digest-value are stale at the point in them that the digest was generated; otherwise, all are fresh.A CACHE_DIGEST frame MUST be sent from a client to a server on stream 0, and conveys a digest of
the contents of the client’s cache for the indicated origin.In typical use, a client will send one or more CACHE_DIGESTs immediately after the first request on
a connection for a given origin, on the same stream, because there is usually a short period of
inactivity then, and servers can benefit most when they understand the state of the cache before
they begin pushing associated assets (e.g., CSS, JavaScript and images). Clients MAY send
CACHE_DIGEST at other times.If the cache’s state is cleared, lost, or the client otherwise wishes the server to stop using
previously sent CACHE_DIGESTs, it can send a CACHE_DIGEST with the RESET flag set.When generating CACHE_DIGEST, a client MUST NOT include cached responses whose URLs do not share
origins with the indicated origin. Clients MUST NOT send CACHE_DIGEST frames on
connections that are not authoritative (as defined in , 10.1) for the indicated origin.CACHE_DIGEST allows the client to indicate whether the set of URLs used to compute the digest
represent fresh or stale stored responses, using the STALE flag. Clients MAY decide whether to only
sent CACHE_DIGEST frames representing their fresh stored responses, their stale stored responses,
or both.Clients can choose to only send a subset of the suitable stored responses of each type (fresh or
stale). However, when the CACHE_DIGEST frames sent represent the complete set of stored responses
of a given type, the last such frame SHOULD have a COMPLETE flag set, to indicate to the server
that it has all relevant state of that type. Note that for the purposes of COMPLETE, responses
cached since the beginning of the connection or the last RESET flag on a CACHE_DIGEST frame need
not be included.CACHE_DIGEST can be computed to include cached responses’ ETags, as indicated by the VALIDATORS
flag. This information can be used by servers to decide what kinds of responses to push to clients;
for example, a stale response that hasn’t changed could be refreshed with a 304 (Not Modified)
response; one that has changed can be replaced with a 200 (OK) response, whether the cached
response was fresh or stale.CACHE_DIGEST has no defined meaning when sent from servers, and SHOULD be ignored by clients.Given the following inputs:validators, a boolean indicating whether validators () are to be included in the digest;URLs', an array of (string URL, string ETag) tuples, each corresponding to the Effective Request URI (, Section 5.5) of a cached response and its entity-tag (if validators is true and if the ETag is available; otherwise, null);P, an integer that MUST be a power of 2 smaller than 2**32, that indicates the probability of a false positive that is acceptable, expressed as 1/P.digest-value can be computed using the following algorithm:Let N be the count of URLs’ members, rounded to the nearest power of 2 smaller than 2**32.Let hash-values be an empty array of integers.For each (URL, ETag) in URLs, compute a hash value () and append the result to hash-values.Sort hash-values in ascending order.Let digest-value be an empty array of bits.Write log base 2 of N to digest-value using 5 bits.Write log base 2 of P to digest-value using 5 bits.Let C be -1.For each V in hash-values:
If V is equal to C, continue to the next V.Let D be the result of V - C - 1.Let Q be the integer result of D / P.Let R be the result of D modulo P.Write Q ‘0’ bits to digest-value.Write 1 ‘1’ bit to digest-value.Write R to digest-value as binary, using log2(P) bits.Let C be VIf the length of digest-value is not a multiple of 8, pad it with 0s until it is.Given:URL, an array of charactersETag, an array of charactersvalidators, a booleanN, an integerP, an integerhash-value can be computed using the following algorithm:Let key be URL converted to an ASCII string by percent-encoding as appropriate .If validators is true and ETag is not null:
Append ETag to key as an ASCII string, including both the weak indicator (if present) and double quotes, as per Section 2.3.Let hash-value be the SHA-256 message digest of key, expressed as an integer.Truncate hash-value to log2( N * P ) bits.In typical use, a server will query (as per ) the CACHE_DIGESTs received on a given
connection to inform what it pushes to that client;If a given URL has a match in a current CACHE_DIGEST with the STALE flag unset, it need not be pushed, because it is fresh in cache;If a given URL and ETag combination has a match in a current CACHE_DIGEST with the STALE flag set, the client has a stale copy in cache, and a validating response can be pushed;If a given URL has no match in any current CACHE_DIGEST, the client does not have a cached copy, and a complete response can be pushed.Servers MAY use all CACHE_DIGESTs received for a given origin as current, as long as they do not
have the RESET flag set; a CACHE_DIGEST frame with the RESET flag set MUST clear any
previously stored CACHE_DIGESTs for its origin. Servers MUST treat an empty Digest-Value with a
RESET flag set as effectively clearing all stored digests for that origin.Clients are not likely to send updates to CACHE_DIGEST over the lifetime of a connection; it is
expected that servers will separately track what cacheable responses have been sent previously on
the same connection, using that knowledge in conjunction with that provided by CACHE_DIGEST.Servers MUST ignore CACHE_DIGEST frames sent on a stream other than 0.Given:digest-value, an array of bitsURL, an array of charactersETag, an array of charactersvalidators, a booleanwe can determine whether there is a match in the digest using the following algorithm:Read the first 5 bits of digest-value as an integer; let N be two raised to the power of that value.Read the next 5 bits of digest-value as an integer; let P be two raised to the power of that value.Let hash-value be the result of computing a hash value ().Let C be -1.Read ‘0’ bits from digest-value until a ‘1’ bit is found; let Q bit the number of ‘0’ bits. Discard the ‘1’.Read log2(P) bits from digest-value after the ‘1’ as an integer; let R be its value.Let D be Q * P + R.Increment C by D + 1.If C is equal to hash-value, return ‘true’.Otherwise, return to step 5 and continue processing; if no match is found before digest-value is exhausted, return ‘false’.This draft currently has no requirements for IANA. If the CACHE_DIGEST frame is standardised, it
will need to be assigned a frame type.The contents of a User Agent’s cache can be used to re-identify or “fingerprint” the user over
time, even when other identifiers (e.g., Cookies ) are cleared.CACHE_DIGEST allows such cache-based fingerprinting to become passive, since it allows the server
to discover the state of the client’s cache without any visible change in server behaviour.As a result, clients MUST mitigate for this threat when the user attempts to remove identifiers
(e.g., “clearing cookies”). This could be achieved in a number of ways; for example: by clearing
the cache, by changing one or both of N and P, or by adding new, synthetic entries to the digest to
change its contents.TODO: discuss how effective the suggested mitigations actually would be.Additionally, User Agents SHOULD NOT send CACHE_DIGEST when in “privacy mode.”Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Uniform Resource Identifier (URI): Generic SyntaxA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. This specification defines the generic URI syntax and a process for resolving URI references that might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet. The URI syntax defines a grammar that is a superset of all valid URIs, allowing an implementation to parse the common components of a URI reference without knowing the scheme-specific requirements of every possible identifier. This specification does not define a generative grammar for URIs; that task is performed by the individual specifications of each URI scheme. [STANDARDS-TRACK]US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)Federal Information Processing Standard, FIPSThe Web Origin ConceptThis document defines the concept of an "origin", which is often used as the scope of authority or privilege by user agents. Typically, user agents isolate content retrieved from different origins to prevent malicious web site operators from interfering with the operation of benign web sites. In addition to outlining the principles that underlie the concept of origin, this document details how to determine the origin of a URI and how to serialize an origin into a string. It also defines an HTTP header field, named "Origin", that indicates which origins are associated with an HTTP request. [STANDARDS-TRACK]Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and RoutingThe Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document provides an overview of HTTP architecture and its associated terminology, defines the "http" and "https" Uniform Resource Identifier (URI) schemes, defines the HTTP/1.1 message syntax and parsing requirements, and describes related security concerns for implementations.Hypertext Transfer Protocol (HTTP/1.1): Conditional RequestsThe Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document defines HTTP/1.1 conditional requests, including metadata header fields for indicating state changes, request header fields for making preconditions on such state, and rules for constructing the responses to a conditional request when one or more preconditions evaluate to false.Hypertext Transfer Protocol (HTTP/1.1): CachingThe Hypertext Transfer Protocol (HTTP) is a stateless \%application- level protocol for distributed, collaborative, hypertext information systems. This document defines HTTP caches and the associated header fields that control cache behavior or indicate cacheable response messages.Hypertext Transfer Protocol Version 2 (HTTP/2)This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2). HTTP/2 enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged.HTTP State Management MechanismThis document defines the HTTP Cookie and Set-Cookie header fields. These header fields can be used by HTTP servers to store state (called cookies) at HTTP user agents, letting the servers maintain a stateful session over the mostly stateless HTTP protocol. Although cookies have many historical infelicities that degrade their security and privacy, the Cookie and Set-Cookie header fields are widely used on the Internet. This document obsoletes RFC 2965. [STANDARDS-TRACK]Adaptive variable-length coding for efficient compression of spacecraft television dataThanks to Adam Langley and Giovanni Bajo for their explorations of Golumb-coded sets. In
particular, see
http://giovanni.bajo.it/post/47119962313/golomb-coded-sets-smaller-than-bloom-filters, which
refers to sample code.Thanks to Stefan Eissing for his suggestions.