HPACK - Header Compression for HTTP/2.0

3.1. Encoding Concepts

The encoding and decoding of headers relies on some components and concepts. The set of components used form an encoding context. ¶

Header Table:: The header table (see Section 3.1.2) is a component used to associate headers to index values.
Reference Set:: The reference set (see Section 3.1.3) is a component containing a group of headers used as a reference for the differential encoding of a new set of headers.
Header Set:: A header set (see Section 3.1.4) is a group of headers that are encoded jointly. A complete set of key-value pairs as encoded in an HTTP request or response is a header set.
Header Representation:: A header can be represented in encoded form either as a literal or as an index (see Section 3.1.5). The indexed representation is based on the header table.
Header Emission:: When decoding a set of headers, some operations emit a header (see Section 3.1.6). An emitted header is added to the set of headers that form the HTTP request or response. Once emitted, a header can't be removed from the set of headers.

3.1.1. Encoding Context

The set of components used to encode or decode a header set form an encoding context: an encoding context contains a header table and a reference set.¶

Using HTTP, messages are exchanged between a client and a server in both direction. To keep the encoding of headers in each direction independent from the other direction, there is one encoding context for each direction.¶

The headers contained in a PUSH_PROMISE frame sent by a server to a client are encoded within the same context as the headers contained in the HEADERS frame corresponding to a response sent from the server to the client.¶

3.1.2. Header Table

A header table consists of an ordered list of (name, value) pairs. The first entry of a header table is assigned the index 0.¶

A header can be represented by an entry from the header table. Rather than encoding a literal value for the header field name and value, the encoder can select an entry from the header table.¶

Literal header names MUST be translated to lowercase before encoding and transmission. This enables an encoder to perform direct bitwise comparisons on names and values when determining if an entry already exists in the header table.¶

There is no need for the header table to contain duplicate entries. However, duplicate entries MUST NOT be treated as an error by a decoder.¶

Initially, a header table contains a list of common headers. Two initial lists of header are provided in Appendix B. One list is for headers transmitted from a client to a server, the other for the reverse direction.¶

A header table is modified by either adding a new entry at the end of the table, or by replacing an existing entry.¶

The encoder decides how to update the header table and as such can control how much memory is used by the header table. To limit the memory requirements on the decoder side, the header table size is bounded (see the SETTINGS_HEADER_TABLE_SIZE in Section 5).¶

The size of an entry is the sum of its name's length in bytes (as defined in Section 4.1.2), of its value's length in bytes (Section 4.1.3) and of 32 bytes. The 32 bytes are an accounting for the entry structure overhead. For example, an entry structure using two 64-bits pointers to reference the name and the value and the entry, and two 64-bits integer for counting the number of references to these name and value would use 32 bytes.¶

The size of a header table is the sum of the size of its entries.¶

3.1.3. Reference Set

A reference set is defined as an unordered set of references to entries of the header table.¶

The initial reference set is the empty set.¶

The reference set is updated during the processing of a set of headers.¶

Using the differential encoding, a header that is not present in the reference set can be encoded either with an indexed representation (if the header is present in the header table), or with a literal representation (if the header is not present in the header table).¶

A header that is to be removed from the reference set is encoded with an indexed representation.¶

3.1.5. Header Representation

A header can be represented either as a literal or as an index.¶

Literal Representation:

A literal representation defines a new header. The header name is represented either literally or as a reference to an entry of the header table. The header value is represented literally.

Three different literal representations are provided:

A literal representation that does not add the header to the header table (see Section 4.3.1).
A literal representation that adds the header at the end of the header table (see Section 4.3.2).
A literal representation that uses the header to replace an existing entry of the header table (see Section 4.3.3).

Indexed Representation:

The indexed representation defines a header as a reference in the header table (see Section 4.2).

3.1.6. Header Emission

The emission of header is the process of adding a header to the current set of headers. Once an header is emitted, it can't be removed from the current set of headers.¶

The concept of header emission allows a decoder to know when it can pass a header safely to a higher level on the receiver side. This allows a decoder to be implemented in a streaming way, and as such to only keep in memory the header table and the reference set. With such an implementation, the amount of memory used by the decoder is bounded, even in presence of a very large set of headers. The management of memory for handling very large sets of headers can therefore be deferred to the application, which may be able to emit the header to the wire and thus free up memory quickly.¶

4. Detailed Format

4.1. Low-level representations

4.1.1. Integer representation

Integers are used to represent name indexes, pair indexes or string lengths. To allow for optimized processing, an integer representation always finishes at the end of a byte.¶

An integer is represented in two parts: a prefix that fills the current byte and an optional list of bytes that are used if the integer value does not fit in the prefix. The number of bits of the prefix (called N) is a parameter of the integer representation.¶

The N-bit prefix allows filling the current byte. If the value is small enough (strictly less than 2^N-1), it is encoded within the N-bit prefix. Otherwise all the bits of the prefix are set to 1 and the value is encoded using an unsigned variable length integer representation.¶

The algorithm to represent an integer I is as follows: ¶

If I < 2^N - 1, encode I on N bits
Else
    encode 2^N - 1 on N bits
    I = I - (2^N - 1)
    While I >= 128
         Encode (I % 128 + 128) on 8 bits
         I = I / 128
    encode (I) on 8 bits

¶⧉

4.1.1.1. Example 1: Encoding 10 using a 5-bit prefix

The value 10 is to be encoded with a 5-bit prefix. ¶

10 is less than 31 (= 2^5 - 1) and is represented using the 5-bit prefix.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 |   10 stored on 5 bits
+---+---+---+---+---+---+---+---+

¶⧉

4.1.1.2. Example 2: Encoding 1337 using a 5-bit prefix

The value I=1337 is to be encoded with a 5-bit prefix. ¶

1337 is greater than 31 (= 2^5 - 1).
- The 5-bit prefix is filled with its max value (31).
I = 1337 - (2^5 - 1) = 1306.
- I (1306) is greater than or equal to 128, the while loop body executes:
- - I % 128 == 26
  - 26 + 128 == 154
  - 154 is encoded in 8 bits as: 10011010
  - I is set to 10 (1306 / 128 == 10)
  - I is no longer greater than or equal to 128, the while loop terminates.
- I, now 10, is encoded on 8 bits as: 00001010
The process ends.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 |   Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |   1306>=128, encode(154), I = 1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |   10<128, encode(10), done
+---+---+---+---+---+---+---+---+

¶⧉

4.1.2. Header Name Representation

Header names are sequences of ASCII characters that MUST conform to the following header-name ABNF construction:¶

  LOWERALPHA = %x61-7A
  header-char = "!" / "#" / "$" / "%" / "&" / "'" /
                "*" / "+" / "-" / "." / "^" / "_" /
                "`" / "|" / "~" / DIGIT / LOWERALPHA
  header-name = [":"] 1*header-char

¶⧉

They are encoded in two parts: ¶

The length of the text, defined as the number of octets of storage required to store the text, represented as a variable-length-quantity (Section 4.1.1).
The specific sequence of ASCII octets

4.1.3. Header Value Representation

Header values are encoded as sequences of UTF-8 encoded text. They are encoded in two parts: ¶

The length of the text, defined as the number of octets of storage required to store the text, represented as a variable-length-quantity (Section 4.1.1).
The specific sequence of octets representing the UTF-8 text.

Invalid UTF-8 octet sequences, "over-long" UTF-8 encodings, and UTF-8 octets that represent invalid Unicode Codepoints MUST NOT be used.¶

4.2. Indexed Header Representation

An indexed header representation identifies an entry in the header table. The entry is emitted and added to the reference set if it is not currently in the reference set. The entry is removed from the reference set if it is present in the reference set.¶

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 1 |        Index (7+)         |
+---+---------------------------+

¶⧉

Figure 1: Indexed Header

This representation starts with the '1' 1-bit pattern, followed by the index of the matching pair, represented as an integer with a 7-bit prefix.¶

4.3. Literal Header Representation

Literal header representations contain a literal header field value. Header field names are either provided as a literal or by reference to an existing header table entry.¶

Literal representations all result in the emission of a header when decoded.¶

4.3.1. Literal Header without Indexing

An literal header without indexing causes the emission of a header without altering the header table.¶

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 |    Index (5+)     |
+---+---+---+-------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

¶⧉

Figure 2: Literal Header without Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 |         0         |
+---+---+---+-------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

¶⧉

Figure 3: Literal Header without Indexing - New Name

This representation starts with the '011' 3-bit pattern.¶

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 5-bit prefix. Note that if the index is strictly below 31, one byte is used.¶

If the header name does not match a header name entry, the value 0 is represented on 5 bits followed by the header name (Section 4.1.2).¶

Header name representation is followed by the header value represented as a literal string as described in Section 4.1.3.¶

4.3.2. Literal Header with Incremental Indexing

A literal header with incremental indexing adds a new entry to the header table.¶

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 |    Index (5+)     |
+---+---+---+-------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

¶⧉

Figure 4: Literal Header with Incremental Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 |         0         |
+---+---+---+-------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

¶⧉

Figure 5: Literal Header with Incremental Indexing - New Name

This representation starts with the '010' 3-bit pattern.¶

If the header name does not match a header name entry, the value 0 is represented on 5 bits followed by the header name (Section 4.1.2).¶

Header name representation is followed by the header value represented as a literal string as described in Section 4.1.3.¶

4.3.3. Literal Header with Substitution Indexing

A literal header with substitution indexing replaces an existing header table entry.¶

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 0 |      Index (6+)       |
+---+---+-----------------------+
|    Substituted Index (8+)     |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

¶⧉

Figure 6: Literal Header with Substitution Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 0 |           0           |
+---+---+-----------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|    Substituted Index (8+)     |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

¶⧉

Figure 7: Literal Header with Substitution Indexing - New Name

This representation starts with the '00' 2-bit pattern.¶

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 6-bit prefix. Note that if the index is strictly below 63, one byte is used.¶

If the header name does not match a header name entry, the value 0 is represented on 6 bits followed by the header name (Section 4.1.2).¶

The index of the substituted (name, value) pair is inserted after the header name representation as a 0-bit prefix integer.¶

The index of the substituted pair MUST correspond to a position in the header table containing a non-void entry. An index for the substituted pair that corresponds to empty position in the header table MUST be treated as an error.¶

This index is followed by the header value represented as a literal string as described in Section 4.1.3.¶

Table 1: Initial Header Table for Requests¶
Index	Header Name	Header Value
0	:scheme	http
1	:scheme	https
2	:host
3	:path	/
4	:method	GET
5	accept
6	accept-charset
7	accept-encoding
8	accept-language
9	cookie
10	if-modified-since
11	user-agent
12	referer
13	authorization
14	allow
15	cache-control
16	connection
17	content-length
18	content-type
19	date
20	expect
21	from
22	if-match
23	if-none-match
24	if-range
25	if-unmodified-since
26	max-forwards
27	proxy-authorization
28	range
29	via

Table 2: Initial Header Table for Responses¶
Index	Header Name	Header Value
0	:status	200
1	age
2	cache-control
3	content-length
4	content-type
5	date
6	etag
7	expires
8	last-modified
9	server
10	set-cookie
11	vary
12	via
13	access-control-allow-origin
14	accept-ranges
15	allow
16	connection
17	content-disposition
18	content-encoding
19	content-language
20	content-location
21	content-range
22	link
23	location
24	proxy-authenticate
25	refresh
26	retry-after
27	strict-transport-security
28	transfer-encoding
29	www-authenticate

Appendix C. Example

Here is an example that illustrates different representations and how tables are updated. [rfc.comment.6: This section needs to be updated to better reflect the new processing of header fields, and include more examples.] ¶

C.1. First header set

The first header set to represent is the following: ¶

:path: /my-example/index.html
user-agent: my-user-agent
mynewheader: first

¶⧉

The header table is empty, all headers are represented as literal headers with indexing. The 'mynewheader' header name is not in the header name table and is encoded literally. This gives the following representation:

0x44      (literal header with incremental indexing, name index = 3)
0x16      (header value string length = 22)
/my-example/index.html
0x4D      (literal header with incremental indexing, name index = 12)
0x0D      (header value string length = 13)
my-user-agent
0x40      (literal header with incremental indexing, new name)
0x0B      (header name string length = 11)
mynewheader
0x05      (header value string length = 5)
first

¶⧉

The header table is as follows after the processing of these headers:

Header table
+---------+----------------+---------------------------+
|  Index  | Header Name    | Header Value              |
+---------+----------------+---------------------------+
|    0    | :scheme        | http                      |
+---------+----------------+---------------------------+
|    1    | :scheme        | https                     |
+---------+----------------+---------------------------+
|   ...   | ...            | ...                       |
+---------+----------------+---------------------------+
|   37    | warning        |                           |
+---------+----------------+---------------------------+
|   38    | :path          | /my-example/index.html    | added header
+---------+----------------+---------------------------+
|   39    | user-agent     | my-user-agent             | added header
+---------+----------------+---------------------------+
|   40    | mynewheader    | first                     | added header
+---------+----------------+---------------------------+

¶⧉

As all the headers in the first header set are indexed in the header table, all are kept in the reference set of headers, which is:

Reference Set:
:path, /my-example/index.html
user-agent, my-user-agent
mynewheader, first

¶⧉

C.2. Second header set

The second header set to represent is the following: ¶

:path: /my-example/resources/script.js
user-agent: my-user-agent
mynewheader: second

¶⧉

Comparing this second header set to the reference set, the first and third headers are from the reference set are not present in this second header set and must be removed. In addition, in this new set, the first and third headers have to be encoded. The path header is represented as a literal header with substitution indexing. The mynewheader will be represented as a literal header with incremental indexing.

0xa6       (indexed header, index = 38: removal from reference set)
0xa8       (indexed header, index = 40: removal from reference set)
0x04       (literal header, substitution indexing, name index = 3)
0x26       (replaced entry index = 38)
0x1f       (header value string length = 31)
/my-example/resources/script.js
0x5f 0x0a  (literal header, incremental indexing, name index = 40)
0x06       (header value string length = 6)
second

¶⧉

The header table is updated as follow:

Header table
+---------+----------------+---------------------------+
|  Index  | Header Name    | Header Value              |
+---------+----------------+---------------------------+
|    0    | :scheme        | http                      |
+---------+----------------+---------------------------+
|    1    | :scheme        | https                     |
+---------+----------------+---------------------------+
|   ...   | ...            | ...                       |
+---------+----------------+---------------------------+
|   37    | warning        |                           |
+---------+----------------+---------------------------+
|   38    | :path          | /my-example/resources/    | replaced
|         |                |     script.js             | header
+---------+----------------+---------------------------+
|   39    | user-agent     | my-user-agent             |
+---------+----------------+---------------------------+
|   40    | mynewheader    | first                     |
+---------+----------------+---------------------------+
|   41    | mynewheader    | second                    | added header
+---------+----------------+---------------------------+

¶⧉

All the headers in this second header set are indexed in the header table, therefore, all are kept in the reference set of headers, which becomes:

Reference Set:
:path, /my-example/resources/script.js
user-agent, my-user-agent
mynewheader, second

¶⧉

8. References

8.1. Normative References

8.2. Informative References