draft-ietf-quic-recovery-27.txt   draft-ietf-quic-recovery-latest.txt 
QUIC Working Group J. Iyengar, Ed. QUIC Working Group J. Iyengar, Ed.
Internet-Draft Fastly Internet-Draft Fastly
Intended status: Standards Track I. Swett, Ed. Intended status: Standards Track I. Swett, Ed.
Expires: August 24, 2020 Google Expires: October 4, 2020 Google
February 21, 2020 April 2, 2020
QUIC Loss Detection and Congestion Control QUIC Loss Detection and Congestion Control
draft-ietf-quic-recovery-27 draft-ietf-quic-recovery-latest
Abstract Abstract
This document describes loss detection and congestion control This document describes loss detection and congestion control
mechanisms for QUIC. mechanisms for QUIC.
Note to Readers Note to Readers
Discussion of this draft takes place on the QUIC working group Discussion of this draft takes place on the QUIC working group
mailing list (quic@ietf.org), which is archived at mailing list (quic@ietf.org), which is archived at
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 24, 2020. This Internet-Draft will expire on October 4, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 27 skipping to change at page 2, line 27
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4
3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5
3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5
3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6
3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6
3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 6 3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 6
3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7 3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7
3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7 3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7
3.1.6. Explicit Correction For Delayed Acknowledgements . . 7 3.1.6. Explicit Correction For Delayed Acknowledgements . . 7
4. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 7 3.1.7. Probe Timeout Replaces RTO and TLP . . . . . . . . . 7
4.1. Generating RTT samples . . . . . . . . . . . . . . . . . 7 4. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 8
4.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 8 4.1. Generating RTT samples . . . . . . . . . . . . . . . . . 8
4.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 9
4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 9 4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 9
5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 10 5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 10
5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10 5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10
5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 11 5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 11
5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 11 5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 11
5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 12 5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 12 5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 12
5.3. Handshakes and New Paths . . . . . . . . . . . . . . . . 13 5.2.2. Handshakes and New Paths . . . . . . . . . . . . . . 13
5.3.1. Sending Probe Packets . . . . . . . . . . . . . . . . 14 5.2.3. Speeding Up Handshake Completion . . . . . . . . . . 14
5.3.2. Loss Detection . . . . . . . . . . . . . . . . . . . 15 5.2.4. Sending Probe Packets . . . . . . . . . . . . . . . . 15
5.4. Handling Retry Packets . . . . . . . . . . . . . . . . . 15 5.2.5. Loss Detection . . . . . . . . . . . . . . . . . . . 16
5.5. Discarding Keys and Packet State . . . . . . . . . . . . 15 5.3. Handling Retry Packets . . . . . . . . . . . . . . . . . 16
6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 16 5.4. Discarding Keys and Packet State . . . . . . . . . . . . 16
6.1. Explicit Congestion Notification . . . . . . . . . . . . 16 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 17
6.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 17 6.1. Explicit Congestion Notification . . . . . . . . . . . . 18
6.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 17 6.2. Initial and Minimum Congestion Window . . . . . . . . . . 18
6.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 17 6.3. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 18
6.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 17 6.4. Congestion Avoidance . . . . . . . . . . . . . . . . . . 18
6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 17 6.5. Recovery Period . . . . . . . . . . . . . . . . . . . . . 19
6.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 18 6.6. Ignoring Loss of Undecryptable Packets . . . . . . . . . 19
6.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.7. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 19
6.9. Under-utilizing the Congestion Window . . . . . . . . . . 19 6.8. Persistent Congestion . . . . . . . . . . . . . . . . . . 20
7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 6.9. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 20 6.10. Under-utilizing the Congestion Window . . . . . . . . . . 21
7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 20 7. Security Considerations . . . . . . . . . . . . . . . . . . . 22
7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 20 7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 22
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 22
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 22
9.1. Normative References . . . . . . . . . . . . . . . . . . 21 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
9.2. Informative References . . . . . . . . . . . . . . . . . 21 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 23 9.1. Normative References . . . . . . . . . . . . . . . . . . 23
Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 23 9.2. Informative References . . . . . . . . . . . . . . . . . 23
A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 23 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 23 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 25
A.2. Constants of interest . . . . . . . . . . . . . . . . . . 24 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 25
A.3. Variables of interest . . . . . . . . . . . . . . . . . . 24 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 26
A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 25 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 26
A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 26 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 27
A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 26 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 27
A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 27 A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 28
A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 28 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 28
A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 30 A.7. Setting the Loss Detection Timer . . . . . . . . . . . . 30
A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 30 A.8. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 31
Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 31 A.9. Detecting Lost Packets . . . . . . . . . . . . . . . . . 32
B.1. Constants of interest . . . . . . . . . . . . . . . . . . 31 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 33
B.2. Variables of interest . . . . . . . . . . . . . . . . . . 32 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 33
B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 33 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 34
B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 33 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 35
B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 33 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 35
B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 34 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 35
B.7. Process ECN Information . . . . . . . . . . . . . . . . . 34 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 36
B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 35 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 36
Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 35 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 36
C.1. Since draft-ietf-quic-recovery-26 . . . . . . . . . . . . 35 B.9. Upon dropping Initial or Handshake keys . . . . . . . . . 37
C.2. Since draft-ietf-quic-recovery-25 . . . . . . . . . . . . 35 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 37
C.3. Since draft-ietf-quic-recovery-24 . . . . . . . . . . . . 35 C.1. Since draft-ietf-quic-recovery-26 . . . . . . . . . . . . 38
C.4. Since draft-ietf-quic-recovery-23 . . . . . . . . . . . . 36 C.2. Since draft-ietf-quic-recovery-25 . . . . . . . . . . . . 38
C.5. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 36 C.3. Since draft-ietf-quic-recovery-24 . . . . . . . . . . . . 38
C.6. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 36 C.4. Since draft-ietf-quic-recovery-23 . . . . . . . . . . . . 38
C.7. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 36 C.5. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 38
C.8. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 36 C.6. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 38
C.9. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 37 C.7. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 38
C.10. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 37 C.8. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 39
C.11. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 38 C.9. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 39
C.12. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 38 C.10. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 40
C.13. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 38 C.11. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 40
C.14. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 39 C.12. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 41
C.15. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 39 C.13. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 41
C.16. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 39 C.14. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 41
C.17. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 39 C.15. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 41
C.18. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 39 C.16. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 41
C.19. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 39 C.17. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 42
C.20. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 40 C.18. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 42
C.21. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 40 C.19. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 42
C.22. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 40 C.20. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 42
C.23. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 40 C.21. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 42
C.24. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 40 C.22. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 42
C.25. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 40 C.23. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 42
C.26. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 40 C.24. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 42
C.27. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 41 C.25. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 43
Appendix D. Contributors . . . . . . . . . . . . . . . . . . . . 41 C.26. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 43
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 41 C.27. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 43
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 Appendix D. Contributors . . . . . . . . . . . . . . . . . . . . 43
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 43
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43
1. Introduction 1. Introduction
QUIC is a new multiplexed and secure transport atop UDP. QUIC builds QUIC is a new multiplexed and secure transport protocol atop UDP,
on decades of transport and security experience, and implements specified in [QUIC-TRANSPORT]. This document describes congestion
mechanisms that make it attractive as a modern general-purpose control and loss recovery for QUIC. Mechanisms described in this
transport. The QUIC protocol is described in [QUIC-TRANSPORT]. document follow the spirit of existing TCP congestion control and
QUIC implements the spirit of existing TCP congestion control and
loss recovery mechanisms, described in RFCs, various Internet-drafts, loss recovery mechanisms, described in RFCs, various Internet-drafts,
and also those prevalent in the Linux TCP implementation. This or academic papers, and also those prevalent in TCP implementations.
document describes QUIC congestion control and loss recovery, and
where applicable, attributes the TCP equivalent in RFCs, Internet-
drafts, academic papers, and/or TCP implementations.
2. Conventions and Definitions 2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
Definitions of terms that are used in this document: Definitions of terms that are used in this document:
skipping to change at page 7, line 26 skipping to change at page 7, line 26
retransmits, and ensures forward progress without relying on retransmits, and ensures forward progress without relying on
timeouts. timeouts.
3.1.6. Explicit Correction For Delayed Acknowledgements 3.1.6. Explicit Correction For Delayed Acknowledgements
QUIC endpoints measure the delay incurred between when a packet is QUIC endpoints measure the delay incurred between when a packet is
received and when the corresponding acknowledgment is sent, allowing received and when the corresponding acknowledgment is sent, allowing
a peer to maintain a more accurate round-trip time estimate (see a peer to maintain a more accurate round-trip time estimate (see
Section 13.2 of [QUIC-TRANSPORT]). Section 13.2 of [QUIC-TRANSPORT]).
3.1.7. Probe Timeout Replaces RTO and TLP
QUIC uses a probe timeout (see Section 5.2), with a timer based on
TCP's RTO computation. QUIC's PTO includes the peer's maximum
expected acknowledgement delay instead of using a fixed minimum
timeout. QUIC does not collapse the congestion window until
persistent congestion (Section 6.8) is declared, unlike TCP, which
collapses the congestion window upon expiry of an RTO. Instead of
collapsing the congestion window and declaring everything in-flight
lost, QUIC allows probe packets to temporarily exceed the congestion
window whenever the timer expires.
In doing this, QUIC avoids unnecessary congestion window reductions,
obviating the need for correcting mechanisms such as F-RTO [RFC5682].
Since QUIC does not collapse the congestion window on a PTO
expiration, a QUIC sender is not limited from sending more in-flight
packets after a PTO expiration if it still has available congestion
window. This occurs when a sender is application-limited and the PTO
timer expires. This is more aggressive than TCP's RTO mechanism when
application-limited, but identical when not application-limited.
A single packet loss at the tail does not indicate persistent
congestion, so QUIC specifies a time-based definition to ensure one
or more packets are sent prior to a dramatic decrease in congestion
window; see Section 6.8.
4. Estimating the Round-Trip Time 4. Estimating the Round-Trip Time
At a high level, an endpoint measures the time from when a packet was At a high level, an endpoint measures the time from when a packet was
sent to when it is acknowledged as a round-trip time (RTT) sample. sent to when it is acknowledged as a round-trip time (RTT) sample.
The endpoint uses RTT samples and peer-reported host delays (see The endpoint uses RTT samples and peer-reported host delays (see
Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical
description of the network path's RTT. An endpoint computes the description of the network path's RTT. An endpoint computes the
following three values for each path: the minimum value observed over following three values for each path: the minimum value observed over
the lifetime of the path (min_rtt), an exponentially-weighted moving the lifetime of the path (min_rtt), an exponentially-weighted moving
average (smoothed_rtt), and the mean deviation (referred to as average (smoothed_rtt), and the mean deviation (referred to as
skipping to change at page 10, line 47 skipping to change at page 11, line 19
acknowledged packet (Section 5.1.1), or it was sent long enough in acknowledged packet (Section 5.1.1), or it was sent long enough in
the past (Section 5.1.2). the past (Section 5.1.2).
The acknowledgement indicates that a packet sent later was delivered, The acknowledgement indicates that a packet sent later was delivered,
and the packet and time thresholds provide some tolerance for packet and the packet and time thresholds provide some tolerance for packet
reordering. reordering.
Spuriously declaring packets as lost leads to unnecessary Spuriously declaring packets as lost leads to unnecessary
retransmissions and may result in degraded performance due to the retransmissions and may result in degraded performance due to the
actions of the congestion controller upon detecting loss. actions of the congestion controller upon detecting loss.
Implementations that detect spurious retransmissions and increase the Implementations can detect spurious retransmissions and increase the
reordering threshold in packets or time MAY choose to start with reordering threshold in packets or time to reduce future spurious
smaller initial reordering thresholds to minimize recovery latency. retransmissions and loss events. Implementations with adaptive time
thresholds MAY choose to start with smaller initial reordering
thresholds to minimize recovery latency.
5.1.1. Packet Threshold 5.1.1. Packet Threshold
The RECOMMENDED initial value for the packet reordering threshold The RECOMMENDED initial value for the packet reordering threshold
(kPacketThreshold) is 3, based on best practices for TCP loss (kPacketThreshold) is 3, based on best practices for TCP loss
detection [RFC5681] [RFC6675]. Implementations SHOULD NOT use a detection [RFC5681] [RFC6675]. Implementations SHOULD NOT use a
packet threshold less than 3, to keep in line with TCP [RFC5681]. packet threshold less than 3, to keep in line with TCP [RFC5681].
Some networks may exhibit higher degrees of reordering, causing a Some networks may exhibit higher degrees of reordering, causing a
sender to detect spurious losses. Implementers MAY use algorithms sender to detect spurious losses. Implementers MAY use algorithms
developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's
reordering resilience. reordering resilience.
5.1.2. Time Threshold 5.1.2. Time Threshold
Once a later packet within the same packet number space has been Once a later packet within the same packet number space has been
acknowledged, an endpoint SHOULD declare an earlier packet lost if it acknowledged, an endpoint SHOULD declare an earlier packet lost if it
was sent a threshold amount of time in the past. To avoid declaring was sent a threshold amount of time in the past. To avoid declaring
packets as lost too early, this time threshold MUST be set to at packets as lost too early, this time threshold MUST be set to at
least kGranularity. The time threshold is: least the local timer granularity, as indicated by the kGranularity
constant. The time threshold is:
max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity)
If packets sent prior to the largest acknowledged packet cannot yet If packets sent prior to the largest acknowledged packet cannot yet
be declared lost, then a timer SHOULD be set for the remaining time. be declared lost, then a timer SHOULD be set for the remaining time.
Using max(smoothed_rtt, latest_rtt) protects from the two following Using max(smoothed_rtt, latest_rtt) protects from the two following
cases: cases:
o the latest RTT sample is lower than the smoothed RTT, perhaps due o the latest RTT sample is lower than the smoothed RTT, perhaps due
to reordering where the acknowledgement encountered a shorter to reordering where the acknowledgement encountered a shorter
path; path;
o the latest RTT sample is higher than the smoothed RTT, perhaps due o the latest RTT sample is higher than the smoothed RTT, perhaps due
to a sustained increase in the actual RTT, but the smoothed RTT to a sustained increase in the actual RTT, but the smoothed RTT
has not yet caught up. has not yet caught up.
The RECOMMENDED time threshold (kTimeThreshold), expressed as a The RECOMMENDED time threshold (kTimeThreshold), expressed as a
round-trip time multiplier, is 9/8. round-trip time multiplier, is 9/8. The RECOMMENDED value of the
timer granularity (kGranularity) is 1ms.
Implementations MAY experiment with absolute thresholds, thresholds Implementations MAY experiment with absolute thresholds, thresholds
from previous connections, adaptive thresholds, or including RTT from previous connections, adaptive thresholds, or including RTT
variation. Smaller thresholds reduce reordering resilience and variation. Smaller thresholds reduce reordering resilience and
increase spurious retransmissions, and larger thresholds increase increase spurious retransmissions, and larger thresholds increase
loss detection delay. loss detection delay.
5.2. Probe Timeout 5.2. Probe Timeout
A Probe Timeout (PTO) triggers sending one or two probe datagrams A Probe Timeout (PTO) triggers sending one or two probe datagrams
skipping to change at page 12, line 26 skipping to change at page 12, line 44
TCP [RFC5682]. The timeout computation is based on TCP's TCP [RFC5682]. The timeout computation is based on TCP's
retransmission timeout period [RFC6298]. retransmission timeout period [RFC6298].
5.2.1. Computing PTO 5.2.1. Computing PTO
When an ack-eliciting packet is transmitted, the sender schedules a When an ack-eliciting packet is transmitted, the sender schedules a
timer for the PTO period as follows: timer for the PTO period as follows:
PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay
kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in
Appendix A.2 and Appendix A.3.
The PTO period is the amount of time that a sender ought to wait for The PTO period is the amount of time that a sender ought to wait for
an acknowledgement of a sent packet. This time period includes the an acknowledgement of a sent packet. This time period includes the
estimated network roundtrip-time (smoothed_rtt), the variation in the estimated network roundtrip-time (smoothed_rtt), the variation in the
estimate (4*rttvar), and max_ack_delay, to account for the maximum estimate (4*rttvar), and max_ack_delay, to account for the maximum
time by which a receiver might delay sending an acknowledgement. time by which a receiver might delay sending an acknowledgement.
When the PTO is armed for Initial or Handshake packet number spaces, When the PTO is armed for Initial or Handshake packet number spaces,
the max_ack_delay is 0, as specified in 13.2.1 of [QUIC-TRANSPORT]. the max_ack_delay is 0, as specified in 13.2.1 of [QUIC-TRANSPORT].
The PTO value MUST be set to at least kGranularity, to avoid the The PTO value MUST be set to at least kGranularity, to avoid the
timer expiring immediately. timer expiring immediately.
A sender computes its PTO timer every time an ack-eliciting packet is A sender recomputes and may need to reset its PTO timer every time an
sent. When ack-eliciting packets are in-flight in multiple packet ack-eliciting packet is sent. When ack-eliciting packets are in-
number spaces, the timer MUST be set for the packet number space with flight in multiple packet number spaces, the timer MUST be set for
the earliest timeout, except for ApplicationData, which MUST be the packet number space with the earliest timeout, except for
ignored until the handshake completes; see Section 4.1.1 of ApplicationData, which MUST be ignored until the handshake completes;
[QUIC-TLS]. Not arming the PTO for ApplicationData prioritizes see Section 4.1.1 of [QUIC-TLS]. Not arming the PTO for
completing the handshake and prevents the server from sending a 1-RTT ApplicationData prevents a client from retransmitting a 0-RTT packet
packet on a PTO before before it has the keys to process a 1-RTT on a PTO expiration before confirming that the server is able to
packet. decrypt 0-RTT packets, and prevents a server from sending a 1-RTT
packet on a PTO expiration before it has the keys to process an
acknowledgement.
When a PTO timer expires, the PTO period MUST be set to twice its When a PTO timer expires, the PTO period MUST be set to twice its
current value. This exponential reduction in the sender's rate is current value. This exponential reduction in the sender's rate is
important because consecutive PTOs might be caused by loss of packets important because consecutive PTOs might be caused by loss of packets
or acknowledgements due to severe congestion. Even when there are or acknowledgements due to severe congestion. Even when there are
ack-eliciting packets in-flight in multiple packet number spaces, the ack-eliciting packets in-flight in multiple packet number spaces, the
exponential increase in probe timeout occurs across all spaces to exponential increase in probe timeout occurs across all spaces to
prevent excess load on the network. For example, a timeout in the prevent excess load on the network. For example, a timeout in the
Initial packet number space doubles the length of the timeout in the Initial packet number space doubles the length of the timeout in the
Handshake packet number space. Handshake packet number space.
The life of a connection that is experiencing consecutive PTOs is The life of a connection that is experiencing consecutive PTOs is
limited by the endpoint's idle timeout. limited by the endpoint's idle timeout.
The probe timer MUST NOT be set if the time threshold Section 5.1.2 The probe timer MUST NOT be set if the time threshold Section 5.1.2
loss detection timer is set. The time threshold loss detection timer loss detection timer is set. The time threshold loss detection timer
is expected to both expire earlier than the PTO and be less likely to is expected to both expire earlier than the PTO and be less likely to
spuriously retransmit data. spuriously retransmit data.
5.3. Handshakes and New Paths 5.2.2. Handshakes and New Paths
The initial probe timeout for a new connection or new path SHOULD be The initial probe timeout for a new connection or new path SHOULD be
set to twice the initial RTT. Resumed connections over the same set to twice the initial RTT. Resumed connections over the same
network SHOULD use the previous connection's final smoothed RTT value network MAY use the previous connection's final smoothed RTT value as
as the resumed connection's initial RTT. If no previous RTT is the resumed connection's initial RTT. If no previous RTT is
available, the initial RTT SHOULD be set to 500ms, resulting in a 1 available, the initial RTT SHOULD be set to 500ms, resulting in a 1
second initial timeout as recommended in [RFC6298]. second initial timeout as recommended in [RFC6298].
A connection MAY use the delay between sending a PATH_CHALLENGE and A connection MAY use the delay between sending a PATH_CHALLENGE and
receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in
Appendix A.2) for a new path, but the delay SHOULD NOT be considered Section 5.2.2) for a new path, but the delay SHOULD NOT be considered
an RTT sample. an RTT sample.
Prior to handshake completion, when few to none RTT samples have been
generated, it is possible that the probe timer expiration is due to
an incorrect RTT estimate at the client. To allow the client to
improve its RTT estimate, the new packet that it sends MUST be ack-
eliciting. If Handshake keys are available to the client, it MUST
send a Handshake packet, and otherwise it MUST send an Initial packet
in a UDP datagram of at least 1200 bytes.
Initial packets and Handshake packets could be never acknowledged,
but they are removed from bytes in flight when the Initial and
Handshake keys are discarded, as described below in
Section Section 5.4. When Initial or Handshake keys are discarded,
the PTO and loss detection timers MUST be reset, because discarding
keys indicates forward progress and the loss detection timer might
have been set for a now discarded packet number space.
5.2.2.1. Before Address Validation
Until the server has validated the client's address on the path, the Until the server has validated the client's address on the path, the
amount of data it can send is limited to three times the amount of amount of data it can send is limited to three times the amount of
data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If
no data can be sent, then the PTO alarm MUST NOT be armed until no additional data can be sent, the server's PTO alarm MUST NOT be
datagrams have been received from the client. armed until datagrams have been received from the client, because
packets sent on PTO count against the anti-amplification limit. Note
that the server could fail to validate the client's address even if
0-RTT is accepted.
Since the server could be blocked until more packets are received Since the server could be blocked until more packets are received
from the client, it is the client's responsibility to send packets to from the client, it is the client's responsibility to send packets to
unblock the server until it is certain that the server has finished unblock the server until it is certain that the server has finished
its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, its address validation (see Section 8 of [QUIC-TRANSPORT]). That is,
the client MUST set the probe timer if the client has not received an the client MUST set the probe timer if the client has not received an
acknowledgement for one of its Handshake or 1-RTT packets. acknowledgement for one of its Handshake or 1-RTT packets, and has
not received a HANDSHAKE_DONE frame.
Prior to handshake completion, when few to none RTT samples have been 5.2.3. Speeding Up Handshake Completion
generated, it is possible that the probe timer expiration is due to
an incorrect RTT estimate at the client. To allow the client to
improve its RTT estimate, the new packet that it sends MUST be ack-
eliciting. If Handshake keys are available to the client, it MUST
send a Handshake packet, and otherwise it MUST send an Initial packet
in a UDP datagram of at least 1200 bytes.
Initial packets and Handshake packets could be never acknowledged, When a server receives an Initial packet containing duplicate CRYPTO
but they are removed from bytes in flight when the Initial and data, it can assume the client did not receive all of the server's
Handshake keys are discarded. CRYPTO data sent in Initial packets, or the client's estimated RTT is
too small. When a client receives Handshake or 1-RTT packets prior
to obtaining Handshake keys, it may assume some or all of the
server's Initial packets were lost.
5.3.1. Sending Probe Packets To speed up handshake completion under these conditions, an endpoint
MAY send a packet containing unacknowledged CRYPTO data earlier than
the PTO expiry, subject to address validation limits; see Section 8.1
of [QUIC-TRANSPORT].
Peers can also use coalesced packets to ensure that each datagram
elicits at least one acknowledgement. For example, clients can
coalesce an Initial packet containing PING and PADDING frames with a
0-RTT data packet and a server can coalesce an Initial packet
containing a PING frame with one or more packets in its first flight.
5.2.4. Sending Probe Packets
When a PTO timer expires, a sender MUST send at least one ack- When a PTO timer expires, a sender MUST send at least one ack-
eliciting packet in the packet number space as a probe, unless there eliciting packet in the packet number space as a probe, unless there
is no data available to send. An endpoint MAY send up to two full- is no data available to send. An endpoint MAY send up to two full-
sized datagrams containing ack-eliciting packets, to avoid an sized datagrams containing ack-eliciting packets, to avoid an
expensive consecutive PTO expiration due to a single lost datagram or expensive consecutive PTO expiration due to a single lost datagram or
transmit data from multiple packet number spaces. transmit data from multiple packet number spaces. All probe packets
sent on a PTO MUST be ack-eliciting.
In addition to sending data in the packet number space for which the In addition to sending data in the packet number space for which the
timer expired, the sender SHOULD send ack-eliciting packets from timer expired, the sender SHOULD send ack-eliciting packets from
other packet number spaces with in-flight data, coalescing packets if other packet number spaces with in-flight data, coalescing packets if
possible. possible.
If the sender wants to elicit a faster acknowledgement on PTO, it can
skip a packet number to eliminate the ack delay.
When the PTO timer expires, and there is new or previously sent When the PTO timer expires, and there is new or previously sent
unacknowledged data, it MUST be sent. unacknowledged data, it MUST be sent. A probe packet SHOULD carry
new data when possible. A probe packet MAY carry retransmitted
unacknowledged data when new data is unavailable, when flow control
does not permit new data to be sent, or to opportunistically reduce
loss recovery delay. Implementations MAY use alternative strategies
for determining the content of probe packets, including sending new
or retransmitted data based on the application's priorities.
It is possible the sender has no new or previously-sent data to send. It is possible the sender has no new or previously-sent data to send.
As an example, consider the following sequence of events: new As an example, consider the following sequence of events: new
application data is sent in a STREAM frame, deemed lost, then application data is sent in a STREAM frame, deemed lost, then
retransmitted in a new packet, and then the original transmission is retransmitted in a new packet, and then the original transmission is
acknowledged. When there is no data to send, the sender SHOULD send acknowledged. When there is no data to send, the sender SHOULD send
a PING or other ack-eliciting frame in a single packet, re-arming the a PING or other ack-eliciting frame in a single packet, re-arming the
PTO timer. PTO timer.
Alternatively, instead of sending an ack-eliciting packet, the sender Alternatively, instead of sending an ack-eliciting packet, the sender
skipping to change at page 14, line 48 skipping to change at page 16, line 11
sending an additional packet, but increases the risk that loss is sending an additional packet, but increases the risk that loss is
declared too aggressively, resulting in an unnecessary rate reduction declared too aggressively, resulting in an unnecessary rate reduction
by the congestion controller. by the congestion controller.
Consecutive PTO periods increase exponentially, and as a result, Consecutive PTO periods increase exponentially, and as a result,
connection recovery latency increases exponentially as packets connection recovery latency increases exponentially as packets
continue to be dropped in the network. Sending two packets on PTO continue to be dropped in the network. Sending two packets on PTO
expiration increases resilience to packet drops, thus reducing the expiration increases resilience to packet drops, thus reducing the
probability of consecutive PTO events. probability of consecutive PTO events.
Probe packets sent on a PTO MUST be ack-eliciting. A probe packet
SHOULD carry new data when possible. A probe packet MAY carry
retransmitted unacknowledged data when new data is unavailable, when
flow control does not permit new data to be sent, or to
opportunistically reduce loss recovery delay. Implementations MAY
use alternative strategies for determining the content of probe
packets, including sending new or retransmitted data based on the
application's priorities.
When the PTO timer expires multiple times and new data cannot be When the PTO timer expires multiple times and new data cannot be
sent, implementations must choose between sending the same payload sent, implementations must choose between sending the same payload
every time or sending different payloads. Sending the same payload every time or sending different payloads. Sending the same payload
may be simpler and ensures the highest priority frames arrive first. may be simpler and ensures the highest priority frames arrive first.
Sending different payloads each time reduces the chances of spurious Sending different payloads each time reduces the chances of spurious
retransmission. retransmission.
5.3.2. Loss Detection 5.2.5. Loss Detection
Delivery or loss of packets in flight is established when an ACK Delivery or loss of packets in flight is established when an ACK
frame is received that newly acknowledges one or more packets. frame is received that newly acknowledges one or more packets.
A PTO timer expiration event does not indicate packet loss and MUST A PTO timer expiration event does not indicate packet loss and MUST
NOT cause prior unacknowledged packets to be marked as lost. When an NOT cause prior unacknowledged packets to be marked as lost. When an
acknowledgement is received that newly acknowledges packets, loss acknowledgement is received that newly acknowledges packets, loss
detection proceeds as dictated by packet and time threshold detection proceeds as dictated by packet and time threshold
mechanisms; see Section 5.1. mechanisms; see Section 5.1.
5.4. Handling Retry Packets 5.3. Handling Retry Packets
A Retry packet causes a client to send another Initial packet, A Retry packet causes a client to send another Initial packet,
effectively restarting the connection process. A Retry packet effectively restarting the connection process. A Retry packet
indicates that the Initial was received, but not processed. A Retry indicates that the Initial was received, but not processed. A Retry
packet cannot be treated as an acknowledgment, because it does not packet cannot be treated as an acknowledgment, because it does not
indicate that a packet was processed or specify the packet number. indicate that a packet was processed or specify the packet number.
Clients that receive a Retry packet reset congestion control and loss Clients that receive a Retry packet reset congestion control and loss
recovery state, including resetting any pending timers. Other recovery state, including resetting any pending timers. Other
connection state, in particular cryptographic handshake messages, is connection state, in particular cryptographic handshake messages, is
retained; see Section 17.2.5 of [QUIC-TRANSPORT]. retained; see Section 17.2.5 of [QUIC-TRANSPORT].
The client MAY compute an RTT estimate to the server as the time The client MAY compute an RTT estimate to the server as the time
period from when the first Initial was sent to when a Retry or a period from when the first Initial was sent to when a Retry or a
Version Negotiation packet is received. The client MAY use this Version Negotiation packet is received. The client MAY use this
value in place of its default for the initial RTT estimate. value in place of its default for the initial RTT estimate.
5.5. Discarding Keys and Packet State 5.4. Discarding Keys and Packet State
When packet protection keys are discarded (see Section 4.10 of When packet protection keys are discarded (see Section 4.10 of
[QUIC-TLS]), all packets that were sent with those keys can no longer [QUIC-TLS]), all packets that were sent with those keys can no longer
be acknowledged because their acknowledgements cannot be processed be acknowledged because their acknowledgements cannot be processed
anymore. The sender MUST discard all recovery state associated with anymore. The sender MUST discard all recovery state associated with
those packets and MUST remove them from the count of bytes in flight. those packets and MUST remove them from the count of bytes in flight.
Endpoints stop sending and receiving Initial packets once they start Endpoints stop sending and receiving Initial packets once they start
exchanging Handshake packets (see Section 17.2.2.1 of exchanging Handshake packets (see Section 17.2.2.1 of
[QUIC-TRANSPORT]). At this point, recovery state for all in-flight [QUIC-TRANSPORT]). At this point, recovery state for all in-flight
skipping to change at page 16, line 24 skipping to change at page 17, line 26
arrive before Initial packets, early 0-RTT packets will be declared arrive before Initial packets, early 0-RTT packets will be declared
lost, but that is expected to be infrequent. lost, but that is expected to be infrequent.
It is expected that keys are discarded after packets encrypted with It is expected that keys are discarded after packets encrypted with
them would be acknowledged or declared lost. Initial secrets however them would be acknowledged or declared lost. Initial secrets however
might be destroyed sooner, as soon as handshake keys are available might be destroyed sooner, as soon as handshake keys are available
(see Section 4.10.1 of [QUIC-TLS]). (see Section 4.10.1 of [QUIC-TLS]).
6. Congestion Control 6. Congestion Control
This document specifies a Reno congestion controller for QUIC This document specifies a congestion controller for QUIC similar to
[RFC6582]. TCP NewReno [RFC6582].
The signals QUIC provides for congestion control are generic and are The signals QUIC provides for congestion control are generic and are
designed to support different algorithms. Endpoints can unilaterally designed to support different algorithms. Endpoints can unilaterally
choose a different algorithm to use, such as Cubic [RFC8312]. choose a different algorithm to use, such as Cubic [RFC8312].
If an endpoint uses a different controller than that specified in If an endpoint uses a different controller than that specified in
this document, the chosen controller MUST conform to the congestion this document, the chosen controller MUST conform to the congestion
control guidelines specified in Section 3.1 of [RFC8085]. control guidelines specified in Section 3.1 of [RFC8085].
Similar to TCP, packets containing only ACK frames do not count
towards bytes in flight and are not congestion controlled. Unlike
TCP, QUIC can detect the loss of these packets and MAY use that
information to adjust the congestion controller or the rate of ACK-
only packets being sent, but this document does not describe a
mechanism for doing so.
The algorithm in this document specifies and uses the controller's The algorithm in this document specifies and uses the controller's
congestion window in bytes. congestion window in bytes.
An endpoint MUST NOT send a packet if it would cause bytes_in_flight An endpoint MUST NOT send a packet if it would cause bytes_in_flight
(see Appendix B.2) to be larger than the congestion window, unless (see Appendix B.2) to be larger than the congestion window, unless
the packet is sent on a PTO timer expiration (see Section 5.2). the packet is sent on a PTO timer expiration (see Section 5.2).
6.1. Explicit Congestion Notification 6.1. Explicit Congestion Notification
If a path has been verified to support ECN [RFC3168] [RFC8311], QUIC If a path has been verified to support ECN [RFC3168] [RFC8311], QUIC
treats a Congestion Experienced(CE) codepoint in the IP header as a treats a Congestion Experienced (CE) codepoint in the IP header as a
signal of congestion. This document specifies an endpoint's response signal of congestion. This document specifies an endpoint's response
when its peer receives packets with the Congestion Experienced when its peer receives packets with the ECN-CE codepoint.
codepoint.
6.2. Slow Start 6.2. Initial and Minimum Congestion Window
QUIC begins every connection in slow start and exits slow start upon QUIC begins every connection in slow start with the congestion window
loss or upon increase in the ECN-CE counter. QUIC re-enters slow set to an initial value. Endpoints SHOULD use an initial congestion
start any time the congestion window is less than ssthresh, which window of 10 times the maximum datagram size (max_datagram_size),
only occurs after persistent congestion is declared. While in slow limited to the larger of 14720 or twice the maximum datagram size.
start, QUIC increases the congestion window by the number of bytes This follows the analysis and recommendations in [RFC6928],
acknowledged when each acknowledgment is processed. increasing the byte limit to account for the smaller 8 byte overhead
of UDP compared to the 20 byte overhead for TCP.
6.3. Congestion Avoidance The minimum congestion window is the smallest value the congestion
window can decrease to as a response to loss, ECN-CE, or persistent
congestion. The RECOMMENDED value is 2 * max_datagram_size.
Slow start exits to congestion avoidance. Congestion avoidance in 6.3. Slow Start
NewReno uses an additive increase multiplicative decrease (AIMD)
approach that increases the congestion window by one maximum packet
size per congestion window acknowledged. When a loss is detected,
NewReno halves the congestion window and sets the slow start
threshold to the new congestion window.
6.4. Recovery Period While in slow start, QUIC increases the congestion window by the
number of bytes acknowledged when each acknowledgment is processed,
resulting in exponential growth of the congestion window.
QUIC exits slow start upon loss or upon increase in the ECN-CE
counter. When slow start is exited, the congestion window halves and
the slow start threshold is set to the new congestion window. QUIC
re-enters slow start any time the congestion window is less than the
slow start threshold, which only occurs after persistent congestion
is declared.
6.4. Congestion Avoidance
Slow start exits to congestion avoidance. Congestion avoidance uses
an Additive Increase Multiplicative Decrease (AIMD) approach that
increases the congestion window by one maximum packet size per
congestion window acknowledged. When a loss or ECN-CE marking is
detected, NewReno halves the congestion window, sets the slow start
threshold to the new congestion window, and then enters the recovery
period.
6.5. Recovery Period
A recovery period is entered when loss or ECN-CE marking of a packet A recovery period is entered when loss or ECN-CE marking of a packet
is detected. A recovery period ends when a packet sent during the is detected in congestion avoidance after the congestion window and
recovery period is acknowledged. This is slightly different from slow start threshold have been decreased. A recovery period ends
TCP's definition of recovery, which ends when the lost packet that when a packet sent during the recovery period is acknowledged. This
started recovery is acknowledged. is slightly different from TCP's definition of recovery, which ends
when the lost packet that started recovery is acknowledged.
The recovery period limits congestion window reduction to once per The recovery period aims to limit congestion window reduction to once
round trip. During recovery, the congestion window remains unchanged per round trip. Therefore during recovery, the congestion window
irrespective of new losses or increases in the ECN-CE counter. remains unchanged irrespective of new losses or increases in the ECN-
CE counter.
6.5. Ignoring Loss of Undecryptable Packets When entering recovery, a single packet MAY be sent even if bytes in
flight now exceeds the recently reduced congestion window. This
speeds up loss recovery if the data in the lost packet is
retransmitted and is similar to TCP as described in Section 5 of
[RFC6675]. If further packets are lost while the sender is in
recovery, sending any packets in response MUST obey the congestion
window limit.
6.6. Ignoring Loss of Undecryptable Packets
During the handshake, some packet protection keys might not be During the handshake, some packet protection keys might not be
available when a packet arrives. In particular, Handshake and 0-RTT available when a packet arrives and the receiver can choose to drop
packets the packet. In particular, Handshake and 0-RTT packets cannot be
cannot be processed until the Initial packets arrive, and 1-RTT processed until the Initial packets arrive and 1-RTT packets cannot
packets cannot be processed until the handshake completes. Endpoints be processed until the handshake completes. Endpoints MAY ignore the
MAY loss of Handshake, 0-RTT, and 1-RTT packets that might have arrived
ignore the loss of Handshake, 0-RTT, and 1-RTT packets that might before the peer had packet protection keys to process those packets.
arrive before the peer has packet protection keys to process those Endpoints MUST NOT ignore the loss of packets that were sent after
packets. the earliest acknowledged packet in a given packet number space.
6.6. Probe Timeout 6.7. Probe Timeout
Probe packets MUST NOT be blocked by the congestion controller. A Probe packets MUST NOT be blocked by the congestion controller. A
sender MUST however count these packets as being additionally in sender MUST however count these packets as being additionally in
flight, since these packets add network load without establishing flight, since these packets add network load without establishing
packet loss. Note that sending probe packets might cause the packet loss. Note that sending probe packets might cause the
sender's bytes in flight to exceed the congestion window until an sender's bytes in flight to exceed the congestion window until an
acknowledgement is received that establishes loss or delivery of acknowledgement is received that establishes loss or delivery of
packets. packets.
6.7. Persistent Congestion 6.8. Persistent Congestion
When an ACK frame is received that establishes loss of all in-flight When an ACK frame is received that establishes loss of all in-flight
packets sent over a long enough period of time, the network is packets sent over a long enough period of time, the network is
considered to be experiencing persistent congestion. Commonly, this considered to be experiencing persistent congestion. Commonly, this
can be established by consecutive PTOs, but since the PTO timer is can be established by consecutive PTOs, but since the PTO timer is
reset when a new ack-eliciting packet is sent, an explicit duration reset when a new ack-eliciting packet is sent, an explicit duration
must be used to account for those cases where PTOs do not occur or must be used to account for those cases where PTOs do not occur or
are substantially delayed. This duration is computed as follows: are substantially delayed. The rationale for this threshold is to
enable a sender to use initial PTOs for aggressive probing, as TCP
does with Tail Loss Probe (TLP) [RACK], before establishing
persistent congestion, as TCP does with a Retransmission Timeout
(RTO) [RFC5681]. The RECOMMENDED value for
kPersistentCongestionThreshold is 3, which is approximately
equivalent to two TLPs before an RTO in TCP.
This duration is computed as follows:
(smoothed_rtt + 4 * rttvar + max_ack_delay) * (smoothed_rtt + 4 * rttvar + max_ack_delay) *
kPersistentCongestionThreshold kPersistentCongestionThreshold
For example, assume: For example, assume:
smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0 smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0
kPersistentCongestionThreshold = 3 kPersistentCongestionThreshold = 3
If an ack-eliciting packet is sent at time = 0, the following If an ack-eliciting packet is sent at time t = 0, the following
scenario would illustrate persistent congestion: scenario would illustrate persistent congestion:
+-----+------------------------+ +-----+------------------------+
| t=0 | Send Pkt #1 (App Data) | | t=0 | Send Pkt #1 (App Data) |
+-----+------------------------+ +-----+------------------------+
| t=1 | Send Pkt #2 (PTO 1) | | t=1 | Send Pkt #2 (PTO 1) |
| | | | | |
| t=3 | Send Pkt #3 (PTO 2) | | t=3 | Send Pkt #3 (PTO 2) |
| | | | | |
| t=7 | Send Pkt #4 (PTO 3) | | t=7 | Send Pkt #4 (PTO 3) |
| | | | | |
| t=8 | Recv ACK of Pkt #4 | | t=8 | Recv ACK of Pkt #4 |
+-----+------------------------+ +-----+------------------------+
The first three packets are determined to be lost when the The first three packets are determined to be lost when the
acknowlegement of packet 4 is received at t=8. The congestion period acknowledgement of packet 4 is received at t=8. The congestion
is calculated as the time between the oldest and newest lost packets: period is calculated as the time between the oldest and newest lost
(3 - 0) = 3. The duration for persistent congestion is equal to: (1 packets: (3 - 0) = 3. The duration for persistent congestion is
* kPersistentCongestionThreshold) = 3. Because the threshold was equal to: (1 * kPersistentCongestionThreshold) = 3. Because the
reached and because none of the packets between the oldest and the threshold was reached and because none of the packets between the
newest packets are acknowledged, the network is considered to have oldest and the newest packets are acknowledged, the network is
experienced persistent congestion. considered to have experienced persistent congestion.
When persistent congestion is established, the sender's congestion When persistent congestion is established, the sender's congestion
window MUST be reduced to the minimum congestion window window MUST be reduced to the minimum congestion window
(kMinimumWindow). This response of collapsing the congestion window (kMinimumWindow). This response of collapsing the congestion window
on persistent congestion is functionally similar to a sender's on persistent congestion is functionally similar to a sender's
response on a Retransmission Timeout (RTO) in TCP [RFC5681] after response on a Retransmission Timeout (RTO) in TCP [RFC5681] after
Tail Loss Probes (TLP) [RACK]. Tail Loss Probes (TLP) [RACK].
6.8. Pacing 6.9. Pacing
This document does not specify a pacer, but it is RECOMMENDED that a This document does not specify a pacer, but it is RECOMMENDED that a
sender pace sending of all in-flight packets based on input from the sender pace sending of all in-flight packets based on input from the
congestion controller. For example, a pacer might distribute the congestion controller. For example, a pacer might distribute the
congestion window over the smoothed RTT when used with a window-based congestion window over the smoothed RTT when used with a window-based
controller, and a pacer might use the rate estimate of a rate-based controller, or a pacer might use the rate estimate of a rate-based
controller. controller.
An implementation should take care to architect its congestion An implementation should take care to architect its congestion
controller to work well with a pacer. For instance, a pacer might controller to work well with a pacer. For instance, a pacer might
wrap the congestion controller and control the availability of the wrap the congestion controller and control the availability of the
congestion window, or a pacer might pace out packets handed to it by congestion window, or a pacer might pace out packets handed to it by
the congestion controller. Timely delivery of ACK frames is the congestion controller.
important for efficient loss recovery. Packets containing only ACK
frames should therefore not be paced, to avoid delaying their Timely delivery of ACK frames is important for efficient loss
delivery to the peer. recovery. Packets containing only ACK frames SHOULD therefore not be
paced, to avoid delaying their delivery to the peer.
Sending multiple packets into the network without any delay between Sending multiple packets into the network without any delay between
them creates a packet burst that might cause short-term congestion them creates a packet burst that might cause short-term congestion
and losses. Implementations MUST either use pacing or limit such and losses. Implementations MUST either use pacing or limit such
bursts to the initial congestion window, which is recommended to be bursts to the initial congestion window, which is recommended to be
the minimum of 10 * max_datagram_size and max(2* max_datagram_size, the minimum of 10 * max_datagram_size and max(2* max_datagram_size,
14720)), where max_datagram_size is the current maximum size of a 14720)), where max_datagram_size is the current maximum size of a
datagram for the connection, not including UDP or IP overhead. datagram for the connection, not including UDP or IP overhead.
As an example of a well-known and publicly available implementation As an example of a well-known and publicly available implementation
of a flow pacer, implementers are referred to the Fair Queue packet of a flow pacer, implementers are referred to the Fair Queue packet
scheduler (fq qdisc) in Linux (3.11 onwards). scheduler (fq qdisc) in Linux (3.11 onwards).
6.9. Under-utilizing the Congestion Window 6.10. Under-utilizing the Congestion Window
When bytes in flight is smaller than the congestion window and When bytes in flight is smaller than the congestion window and
sending is not pacing limited, the congestion window is under- sending is not pacing limited, the congestion window is under-
utilized. When this occurs, the congestion window SHOULD NOT be utilized. When this occurs, the congestion window SHOULD NOT be
increased in either slow start or congestion avoidance. This can increased in either slow start or congestion avoidance. This can
happen due to insufficient application data or flow control credit. happen due to insufficient application data or flow control limits.
A sender MAY use the pipeACK method described in section 4.3 of A sender MAY use the pipeACK method described in section 4.3 of
[RFC7661] to determine if the congestion window is sufficiently [RFC7661] to determine if the congestion window is sufficiently
utilized. utilized.
A sender that paces packets (see Section 6.8) might delay sending A sender that paces packets (see Section 6.9) might delay sending
packets and not fully utilize the congestion window due to this packets and not fully utilize the congestion window due to this
delay. A sender should not consider itself application limited if it delay. A sender SHOULD NOT consider itself application limited if it
would have fully utilized the congestion window without pacing delay. would have fully utilized the congestion window without pacing delay.
A sender MAY implement alternative mechanisms to update its A sender MAY implement alternative mechanisms to update its
congestion window after periods of under-utilization, such as those congestion window after periods of under-utilization, such as those
proposed for TCP in [RFC7661]. proposed for TCP in [RFC7661].
7. Security Considerations 7. Security Considerations
7.1. Congestion Signals 7.1. Congestion Signals
skipping to change at page 21, line 9 skipping to change at page 23, line 13
so. so.
Endpoints choose the congestion controller that they use. Though Endpoints choose the congestion controller that they use. Though
congestion controllers generally treat reports of ECN-CE markings as congestion controllers generally treat reports of ECN-CE markings as
equivalent to loss [RFC8311], the exact response for each controller equivalent to loss [RFC8311], the exact response for each controller
could be different. Failure to correctly respond to information could be different. Failure to correctly respond to information
about ECN markings is therefore difficult to detect. about ECN markings is therefore difficult to detect.
8. IANA Considerations 8. IANA Considerations
This document has no IANA actions. Yet. This document has no IANA actions.
9. References 9. References
9.1. Normative References 9.1. Normative References
[QUIC-TLS] [QUIC-TLS]
Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
QUIC", draft-ietf-quic-tls-27 (work in progress). QUIC", draft-ietf-quic-tls-latest (work in progress).
[QUIC-TRANSPORT] [QUIC-TRANSPORT]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", draft-ietf-quic- Multiplexed and Secure Transport", draft-ietf-quic-
transport-27 (work in progress). transport-latest (work in progress).
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
"Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
Spurious Retransmission Timeouts with TCP", RFC 5682,
DOI 10.17487/RFC5682, September 2009,
<https://www.rfc-editor.org/info/rfc5682>.
[RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
March 2017, <https://www.rfc-editor.org/info/rfc8085>. March 2017, <https://www.rfc-editor.org/info/rfc8085>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
9.2. Informative References 9.2. Informative References
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement:
Refining TCP Congestion Control", ACM SIGCOMM , August Refining TCP Congestion Control", ACM SIGCOMM , August
1996. 1996.
[RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK:
a time-based fast loss detection algorithm for TCP", a time-based fast loss detection algorithm for TCP",
draft-ietf-tcpm-rack-07 (work in progress), January 2020. draft-ietf-tcpm-rack-08 (work in progress), March 2020.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton,
"Improving the Robustness of TCP to Non-Congestion "Improving the Robustness of TCP to Non-Congestion
Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, Events", RFC 4653, DOI 10.17487/RFC4653, August 2006,
<https://www.rfc-editor.org/info/rfc4653>. <https://www.rfc-editor.org/info/rfc4653>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<https://www.rfc-editor.org/info/rfc5681>. <https://www.rfc-editor.org/info/rfc5681>.
[RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
"Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
Spurious Retransmission Timeouts with TCP", RFC 5682,
DOI 10.17487/RFC5682, September 2009,
<https://www.rfc-editor.org/info/rfc5682>.
[RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and
P. Hurtig, "Early Retransmit for TCP and Stream Control P. Hurtig, "Early Retransmit for TCP and Stream Control
Transmission Protocol (SCTP)", RFC 5827, Transmission Protocol (SCTP)", RFC 5827,
DOI 10.17487/RFC5827, May 2010, DOI 10.17487/RFC5827, May 2010,
<https://www.rfc-editor.org/info/rfc5827>. <https://www.rfc-editor.org/info/rfc5827>.
[RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent,
"Computing TCP's Retransmission Timer", RFC 6298, "Computing TCP's Retransmission Timer", RFC 6298,
DOI 10.17487/RFC6298, June 2011, DOI 10.17487/RFC6298, June 2011,
<https://www.rfc-editor.org/info/rfc6298>. <https://www.rfc-editor.org/info/rfc6298>.
skipping to change at page 24, line 19 skipping to change at page 26, line 28
UDP or IP overhead, but including QUIC framing overhead. UDP or IP overhead, but including QUIC framing overhead.
time_sent: The time the packet was sent. time_sent: The time the packet was sent.
A.2. Constants of interest A.2. Constants of interest
Constants used in loss recovery are based on a combination of RFCs, Constants used in loss recovery are based on a combination of RFCs,
papers, and common practice. papers, and common practice.
kPacketThreshold: Maximum reordering in packets before packet kPacketThreshold: Maximum reordering in packets before packet
threshold loss detection considers a packet lost. The RECOMMENDED threshold loss detection considers a packet lost. The value
value is 3. recommended in Section 5.1.1 is 3.
kTimeThreshold: Maximum reordering in time before time threshold kTimeThreshold: Maximum reordering in time before time threshold
loss detection considers a packet lost. Specified as an RTT loss detection considers a packet lost. Specified as an RTT
multiplier. The RECOMMENDED value is 9/8. multiplier. The value recommended in Section 5.1.2 is 9/8.
kGranularity: Timer granularity. This is a system-dependent value. kGranularity: Timer granularity. This is a system-dependent value,
However, implementations SHOULD use a value no smaller than 1ms. and Section 5.1.2 recommends a value of 1ms.
kInitialRtt: The RTT used before an RTT sample is taken. The kInitialRtt: The RTT used before an RTT sample is taken. The value
RECOMMENDED value is 500ms. recommended in Section 5.2.2 is 500ms.
kPacketNumberSpace: An enum to enumerate the three packet number kPacketNumberSpace: An enum to enumerate the three packet number
spaces. spaces.
enum kPacketNumberSpace { enum kPacketNumberSpace {
Initial, Initial,
Handshake, Handshake,
ApplicationData, ApplicationData,
} }
A.3. Variables of interest A.3. Variables of interest
Variables required to implement the congestion control mechanisms are Variables required to implement the congestion control mechanisms are
described in this section. described in this section.
latest_rtt: The most recent RTT measurement made when receiving an latest_rtt: The most recent RTT measurement made when receiving an
ack for a previously unacked packet. ack for a previously unacked packet.
smoothed_rtt: The smoothed RTT of the connection, computed as smoothed_rtt: The smoothed RTT of the connection, computed as
described in [RFC6298] described in Section 4.3.
rttvar: The RTT variation, computed as described in [RFC6298] rttvar: The RTT variation, computed as described in Section 4.3
min_rtt: The minimum RTT seen in the connection, ignoring ack delay.
min_rtt: The minimum RTT seen in the connection, ignoring ack delay,
as described in Section 4.2.
max_ack_delay: The maximum amount of time by which the receiver max_ack_delay: The maximum amount of time by which the receiver
intends to delay acknowledgments for packets in the intends to delay acknowledgments for packets in the
ApplicationData packet number space. The actual ack_delay in a ApplicationData packet number space. The actual ack_delay in a
received ACK frame may be larger due to late timers, reordering, received ACK frame may be larger due to late timers, reordering,
or lost ACK frames. or lost ACK frames.
loss_detection_timer: Multi-modal timer used for loss detection. loss_detection_timer: Multi-modal timer used for loss detection.
pto_count: The number of times a PTO has been sent without receiving pto_count: The number of times a PTO has been sent without receiving
skipping to change at page 26, line 17 skipping to change at page 28, line 29
After a packet is sent, information about the packet is stored. The After a packet is sent, information about the packet is stored. The
parameters to OnPacketSent are described in detail above in parameters to OnPacketSent are described in detail above in
Appendix A.1.1. Appendix A.1.1.
Pseudocode for OnPacketSent follows: Pseudocode for OnPacketSent follows:
OnPacketSent(packet_number, pn_space, ack_eliciting, OnPacketSent(packet_number, pn_space, ack_eliciting,
in_flight, sent_bytes): in_flight, sent_bytes):
sent_packets[pn_space][packet_number].packet_number = sent_packets[pn_space][packet_number].packet_number =
packet_number packet_number
sent_packets[pn_space][packet_number].time_sent = now sent_packets[pn_space][packet_number].time_sent = now()
sent_packets[pn_space][packet_number].ack_eliciting = sent_packets[pn_space][packet_number].ack_eliciting =
ack_eliciting ack_eliciting
sent_packets[pn_space][packet_number].in_flight = in_flight sent_packets[pn_space][packet_number].in_flight = in_flight
if (in_flight): if (in_flight):
if (ack_eliciting): if (ack_eliciting):
time_of_last_sent_ack_eliciting_packet[pn_space] = now time_of_last_sent_ack_eliciting_packet[pn_space] = now()
OnPacketSentCC(sent_bytes) OnPacketSentCC(sent_bytes)
sent_packets[pn_space][packet_number].size = sent_bytes sent_packets[pn_space][packet_number].size = sent_bytes
SetLossDetectionTimer() SetLossDetectionTimer()
A.6. On Receiving an Acknowledgment A.6. On Receiving an Acknowledgment
When an ACK frame is received, it may newly acknowledge any number of When an ACK frame is received, it may newly acknowledge any number of
packets. packets.
Pseudocode for OnAckReceived and UpdateRtt follow: Pseudocode for OnAckReceived and UpdateRtt follow:
OnAckReceived(ack, pn_space): OnAckReceived(ack, pn_space):
if (largest_acked_packet[pn_space] == infinite): if (largest_acked_packet[pn_space] == infinite):
largest_acked_packet[pn_space] = ack.largest_acked largest_acked_packet[pn_space] = ack.largest_acked
else: else:
largest_acked_packet[pn_space] = largest_acked_packet[pn_space] =
max(largest_acked_packet[pn_space], ack.largest_acked) max(largest_acked_packet[pn_space], ack.largest_acked)
// DetectNewlyAckedPackets finds packets that are newly
// acknowledged and removes them from sent_packets.
newly_acked_packets =
DetectAndRemoveAckedPackets(ack, pn_space)
// Nothing to do if there are no newly acked packets. // Nothing to do if there are no newly acked packets.
newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space)
if (newly_acked_packets.empty()): if (newly_acked_packets.empty()):
return return
// If the largest acknowledged is newly acked and // If the largest acknowledged is newly acked and
// at least one ack-eliciting was newly acked, update the RTT. // at least one ack-eliciting was newly acked, update the RTT.
if (sent_packets[pn_space].contains(ack.largest_acked) && if (newly_acked_packets.largest().packet_number ==
ack.largest_acked &&
IncludesAckEliciting(newly_acked_packets)): IncludesAckEliciting(newly_acked_packets)):
latest_rtt = latest_rtt =
now - sent_packets[pn_space][ack.largest_acked].time_sent now - sent_packets[pn_space][ack.largest_acked].time_sent
ack_delay = 0 ack_delay = 0
if (pn_space == ApplicationData): if (pn_space == ApplicationData):
ack_delay = ack.ack_delay ack_delay = ack.ack_delay
UpdateRtt(ack_delay) UpdateRtt(ack_delay)
// Process ECN information if present. // Process ECN information if present.
if (ACK frame contains ECN information): if (ACK frame contains ECN information):
ProcessECN(ack, pn_space) ProcessECN(ack, pn_space)
for acked_packet in newly_acked_packets: lost_packets = DetectAndRemoveLostPackets(pn_space)
OnPacketAcked(acked_packet.packet_number, pn_space) if (!lost_packets.empty()):
OnPacketsLost(lost_packets)
DetectLostPackets(pn_space) OnPacketsAcked(newly_acked_packets)
pto_count = 0 pto_count = 0
SetLossDetectionTimer() SetLossDetectionTimer()
UpdateRtt(ack_delay): UpdateRtt(ack_delay):
// First RTT sample. // First RTT sample.
if (smoothed_rtt == 0): if (smoothed_rtt == 0):
min_rtt = latest_rtt min_rtt = latest_rtt
smoothed_rtt = latest_rtt smoothed_rtt = latest_rtt
rttvar = latest_rtt / 2 rttvar = latest_rtt / 2
return return
skipping to change at page 27, line 38 skipping to change at page 30, line 4
rttvar = latest_rtt / 2 rttvar = latest_rtt / 2
return return
// min_rtt ignores ack delay. // min_rtt ignores ack delay.
min_rtt = min(min_rtt, latest_rtt) min_rtt = min(min_rtt, latest_rtt)
// Limit ack_delay by max_ack_delay // Limit ack_delay by max_ack_delay
ack_delay = min(ack_delay, max_ack_delay) ack_delay = min(ack_delay, max_ack_delay)
// Adjust for ack delay if plausible. // Adjust for ack delay if plausible.
adjusted_rtt = latest_rtt adjusted_rtt = latest_rtt
if (latest_rtt > min_rtt + ack_delay): if (latest_rtt > min_rtt + ack_delay):
adjusted_rtt = latest_rtt - ack_delay adjusted_rtt = latest_rtt - ack_delay
rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt)
smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
A.7. On Packet Acknowledgment A.7. Setting the Loss Detection Timer
When a packet is acknowledged for the first time, the following
OnPacketAcked function is called. Note that a single ACK frame may
newly acknowledge several packets. OnPacketAcked must be called once
for each of these newly acknowledged packets.
OnPacketAcked takes two parameters: acked_packet, which is the struct
detailed in Appendix A.1.1, and the packet number space that this ACK
frame was sent for.
Pseudocode for OnPacketAcked follows:
OnPacketAcked(acked_packet, pn_space):
if (acked_packet.in_flight):
OnPacketAckedCC(acked_packet)
sent_packets[pn_space].remove(acked_packet.packet_number)
A.8. Setting the Loss Detection Timer
QUIC loss detection uses a single timer for all timeout loss QUIC loss detection uses a single timer for all timeout loss
detection. The duration of the timer is based on the timer's mode, detection. The duration of the timer is based on the timer's mode,
which is set in the packet and timer events further below. The which is set in the packet and timer events further below. The
function SetLossDetectionTimer defined below shows how the single function SetLossDetectionTimer defined below shows how the single
timer is set. timer is set.
This algorithm may result in the timer being set in the past, This algorithm may result in the timer being set in the past,
particularly if timers wake up late. Timers set in the past SHOULD particularly if timers wake up late. Timers set in the past fire
fire immediately. immediately.
Pseudocode for SetLossDetectionTimer follows: Pseudocode for SetLossDetectionTimer follows:
GetEarliestTimeAndSpace(times): GetEarliestTimeAndSpace(times):
time = times[Initial] time = times[Initial]
space = Initial space = Initial
for pn_space in [ Handshake, ApplicationData ]: for pn_space in [ Handshake, ApplicationData ]:
if (times[pn_space] != 0 && if (times[pn_space] != 0 &&
(time == 0 || times[pn_space] < time) && (time == 0 || times[pn_space] < time) &&
# Skip ApplicationData until handshake completion. # Skip ApplicationData until handshake completion.
(pn_space != ApplicationData || (pn_space != ApplicationData ||
IsHandshakeComplete()): IsHandshakeComplete()):
time = times[pn_space]; time = times[pn_space];
space = pn_space space = pn_space
return time, space return time, space
PeerNotAwaitingAddressValidation(): PeerCompletedAddressValidation():
# Assume clients validate the server's address implicitly. # Assume clients validate the server's address implicitly.
if (endpoint is server): if (endpoint is server):
return true return true
# Servers complete address validation when a # Servers complete address validation when a
# protected packet is received. # protected packet is received.
return has received Handshake ACK || return has received Handshake ACK ||
has received 1-RTT ACK has received 1-RTT ACK ||
has received HANDSHAKE_DONE
SetLossDetectionTimer(): SetLossDetectionTimer():
earliest_loss_time, _ = GetEarliestTimeAndSpace(loss_time) earliest_loss_time, _ = GetEarliestTimeAndSpace(loss_time)
if (earliest_loss_time != 0): if (earliest_loss_time != 0):
// Time threshold loss detection. // Time threshold loss detection.
loss_detection_timer.update(earliest_loss_time) loss_detection_timer.update(earliest_loss_time)
return return
if (server is at anti-amplification limit):
// The server's alarm is not set if nothing can be sent.
loss_detection_timer.cancel()
return
if (no ack-eliciting packets in flight && if (no ack-eliciting packets in flight &&
PeerNotAwaitingAddressValidation()): PeerCompletedAddressValidation()):
// There is nothing to detect lost, so no timer is set.
// However, the client needs to arm the timer if the
// server might be blocked by the anti-amplification limit.
loss_detection_timer.cancel() loss_detection_timer.cancel()
return return
// Use a default timeout if there are no RTT measurements // Use a default timeout if there are no RTT measurements
if (smoothed_rtt == 0): if (smoothed_rtt == 0):
timeout = 2 * kInitialRtt timeout = 2 * kInitialRtt
else: else:
// Calculate PTO duration // Calculate PTO duration
timeout = smoothed_rtt + max(4 * rttvar, kGranularity) + timeout = smoothed_rtt + max(4 * rttvar, kGranularity) +
max_ack_delay max_ack_delay
timeout = timeout * (2 ^ pto_count) timeout = timeout * (2 ^ pto_count)
sent_time, _ = GetEarliestTimeAndSpace( sent_time, _ = GetEarliestTimeAndSpace(
time_of_last_sent_ack_eliciting_packet) time_of_last_sent_ack_eliciting_packet)
if (sent_time == 0)
assert(!PeerCompletedAddressValidation())
sent_time = now()
loss_detection_timer.update(sent_time + timeout) loss_detection_timer.update(sent_time + timeout)
A.9. On Timeout A.8. On Timeout
When the loss detection timer expires, the timer's mode determines When the loss detection timer expires, the timer's mode determines
the action to be performed. the action to be performed.
Pseudocode for OnLossDetectionTimeout follows: Pseudocode for OnLossDetectionTimeout follows:
OnLossDetectionTimeout(): OnLossDetectionTimeout():
earliest_loss_time, pn_space = earliest_loss_time, pn_space =
GetEarliestTimeAndSpace(loss_time) GetEarliestTimeAndSpace(loss_time)
if (earliest_loss_time != 0): if (earliest_loss_time != 0):
// Time threshold loss Detection // Time threshold loss Detection
DetectLostPackets(pn_space) lost_packets = DetectLostPackets(pn_space)
assert(!lost_packets.empty())
OnPacketsLost(lost_packets)
SetLossDetectionTimer() SetLossDetectionTimer()
return return
if (endpoint is client without 1-RTT keys): if (bytes_in_flight > 0):
// PTO. Send new data if available, else retransmit old data.
// If neither is available, send a single PING frame.
_, pn_space = GetEarliestTimeAndSpace(
time_of_last_sent_ack_eliciting_packet)
SendOneOrTwoAckElicitingPackets(pn_space)
else:
assert(endpoint is client without 1-RTT keys)
// Client sends an anti-deadlock packet: Initial is padded // Client sends an anti-deadlock packet: Initial is padded
// to earn more anti-amplification credit, // to earn more anti-amplification credit,
// a Handshake packet proves address ownership. // a Handshake packet proves address ownership.
if (has Handshake keys): if (has Handshake keys):
SendOneAckElicitingHandshakePacket() SendOneAckElicitingHandshakePacket()
else: else:
SendOneAckElicitingPaddedInitialPacket() SendOneAckElicitingPaddedInitialPacket()
else:
// PTO. Send new data if available, else retransmit old data.
// If neither is available, send a single PING frame.
_, pn_space = GetEarliestTimeAndSpace(
time_of_last_sent_ack_eliciting_packet)
SendOneOrTwoAckElicitingPackets(pn_space)
pto_count++ pto_count++
SetLossDetectionTimer() SetLossDetectionTimer()
A.10. Detecting Lost Packets A.9. Detecting Lost Packets
DetectLostPackets is called every time an ACK is received and DetectAndRemoveLostPackets is called every time an ACK is received or
operates on the sent_packets for that packet number space. the time threshold loss detection timer expires. This function
operates on the sent_packets for that packet number space and returns
a list of packets newly detected as lost.
Pseudocode for DetectLostPackets follows: Pseudocode for DetectAndRemoveLostPackets follows:
DetectLostPackets(pn_space): DetectAndRemoveLostPackets(pn_space):
assert(largest_acked_packet[pn_space] != infinite) assert(largest_acked_packet[pn_space] != infinite)
loss_time[pn_space] = 0 loss_time[pn_space] = 0
lost_packets = {} lost_packets = {}
loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt)
// Minimum time of kGranularity before packets are deemed lost. // Minimum time of kGranularity before packets are deemed lost.
loss_delay = max(loss_delay, kGranularity) loss_delay = max(loss_delay, kGranularity)
// Packets sent before this time are deemed lost. // Packets sent before this time are deemed lost.
lost_send_time = now() - loss_delay lost_send_time = now() - loss_delay
skipping to change at page 31, line 34 skipping to change at page 33, line 34
unacked.packet_number + kPacketThreshold): unacked.packet_number + kPacketThreshold):
sent_packets[pn_space].remove(unacked.packet_number) sent_packets[pn_space].remove(unacked.packet_number)
if (unacked.in_flight): if (unacked.in_flight):
lost_packets.insert(unacked) lost_packets.insert(unacked)
else: else:
if (loss_time[pn_space] == 0): if (loss_time[pn_space] == 0):
loss_time[pn_space] = unacked.time_sent + loss_delay loss_time[pn_space] = unacked.time_sent + loss_delay
else: else:
loss_time[pn_space] = min(loss_time[pn_space], loss_time[pn_space] = min(loss_time[pn_space],
unacked.time_sent + loss_delay) unacked.time_sent + loss_delay)
return lost_packets
// Inform the congestion controller of lost packets and
// let it decide whether to retransmit immediately.
if (!lost_packets.empty()):
OnPacketsLost(lost_packets)
Appendix B. Congestion Control Pseudocode Appendix B. Congestion Control Pseudocode
We now describe an example implementation of the congestion We now describe an example implementation of the congestion
controller described in Section 6. controller described in Section 6.
B.1. Constants of interest B.1. Constants of interest
Constants used in congestion control are based on a combination of Constants used in congestion control are based on a combination of
RFCs, papers, and common practice. RFCs, papers, and common practice.
kInitialWindow: Default limit on the initial amount of data in kInitialWindow: Default limit on the initial bytes in flight as
flight, in bytes. The RECOMMENDED value is the minimum of 10 * described in Section 6.2.
max_datagram_size and max(2 * max_datagram_size, 14720)). This
follows the analysis and recommendations in [RFC6928], increasing
the byte limit to account for the smaller 8 byte overhead of UDP
compared to the 20 byte overhead for TCP.
kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED kMinimumWindow: Minimum congestion window in bytes as described in
value is 2 * max_datagram_size. Section 6.2.
kLossReductionFactor: Reduction in congestion window when a new loss kLossReductionFactor: Reduction in congestion window when a new loss
event is detected. The RECOMMENDED value is 0.5. event is detected. The Section 6 section recommends a value is
0.5.
kPersistentCongestionThreshold: Period of time for persistent kPersistentCongestionThreshold: Period of time for persistent
congestion to be established, specified as a PTO multiplier. The congestion to be established, specified as a PTO multiplier. The
rationale for this threshold is to enable a sender to use initial Section 6.8 section recommends a value of 3.
PTOs for aggressive probing, as TCP does with Tail Loss Probe
(TLP) [RACK], before establishing persistent congestion, as TCP
does with a Retransmission Timeout (RTO) [RFC5681]. The
RECOMMENDED value for kPersistentCongestionThreshold is 3, which
is approximately equivalent to having two TLPs before an RTO in
TCP.
B.2. Variables of interest B.2. Variables of interest
Variables required to implement the congestion control mechanisms are Variables required to implement the congestion control mechanisms are
described in this section. described in this section.
max_datagram_size: The sender's current maximum payload size. Does max_datagram_size: The sender's current maximum payload size. Does
not include UDP or IP overhead. The max datagram size is used for not include UDP or IP overhead. The max datagram size is used for
congestion window computations. An endpoint sets the value of congestion window computations. An endpoint sets the value of
this variable based on its PMTU (see Section 14.1 of this variable based on its PMTU (see Section 14.1 of
skipping to change at page 33, line 36 skipping to change at page 35, line 27
B.4. On Packet Sent B.4. On Packet Sent
Whenever a packet is sent, and it contains non-ACK frames, the packet Whenever a packet is sent, and it contains non-ACK frames, the packet
increases bytes_in_flight. increases bytes_in_flight.
OnPacketSentCC(bytes_sent): OnPacketSentCC(bytes_sent):
bytes_in_flight += bytes_sent bytes_in_flight += bytes_sent
B.5. On Packet Acknowledgement B.5. On Packet Acknowledgement
Invoked from loss detection's OnPacketAcked and is supplied with the Invoked from loss detection's OnAckReceived and is supplied with the
acked_packet from sent_packets. newly acked_packets from sent_packets.
InCongestionRecovery(sent_time): InCongestionRecovery(sent_time):
return sent_time <= congestion_recovery_start_time return sent_time <= congestion_recovery_start_time
OnPacketAckedCC(acked_packet): OnPacketsAcked(acked_packets):
// Remove from bytes_in_flight. for (packet in acked_packets):
bytes_in_flight -= acked_packet.size // Remove from bytes_in_flight.
if (InCongestionRecovery(acked_packet.time_sent)): bytes_in_flight -= packet.size
// Do not increase congestion window in recovery period. if (InCongestionRecovery(packet.time_sent)):
return // Do not increase congestion window in recovery period.
if (IsAppOrFlowControlLimited()): return
// Do not increase congestion_window if application if (IsAppOrFlowControlLimited()):
// limited or flow control limited. // Do not increase congestion_window if application
return // limited or flow control limited.
if (congestion_window < ssthresh): return
// Slow start. if (congestion_window < ssthresh):
congestion_window += acked_packet.size // Slow start.
else: congestion_window += packet.size
return
// Congestion avoidance. // Congestion avoidance.
congestion_window += max_datagram_size * acked_packet.size congestion_window += max_datagram_size * acked_packet.size
/ congestion_window / congestion_window
B.6. On New Congestion Event B.6. On New Congestion Event
Invoked from ProcessECN and OnPacketsLost when a new congestion event Invoked from ProcessECN and OnPacketsLost when a new congestion event
is detected. May start a new recovery period and reduces the is detected. May start a new recovery period and reduces the
congestion window. congestion window.
CongestionEvent(sent_time): CongestionEvent(sent_time):
// Start a new congestion event if packet was sent after the // Start a new congestion event if packet was sent after the
// start of the previous congestion recovery period. // start of the previous congestion recovery period.
if (!InCongestionRecovery(sent_time)): if (!InCongestionRecovery(sent_time)):
congestion_recovery_start_time = Now() congestion_recovery_start_time = now()
congestion_window *= kLossReductionFactor congestion_window *= kLossReductionFactor
congestion_window = max(congestion_window, kMinimumWindow) congestion_window = max(congestion_window, kMinimumWindow)
ssthresh = congestion_window ssthresh = congestion_window
// A packet can be sent to speed up loss recovery.
MaybeSendOnePacket()
B.7. Process ECN Information B.7. Process ECN Information
Invoked when an ACK frame with an ECN section is received from the Invoked when an ACK frame with an ECN section is received from the
peer. peer.
ProcessECN(ack, pn_space): ProcessECN(ack, pn_space):
// If the ECN-CE counter reported by the peer has increased, // If the ECN-CE counter reported by the peer has increased,
// this could be a new congestion event. // this could be a new congestion event.
if (ack.ce_counter > ecn_ce_counters[pn_space]): if (ack.ce_counter > ecn_ce_counters[pn_space]):
ecn_ce_counters[pn_space] = ack.ce_counter ecn_ce_counters[pn_space] = ack.ce_counter
CongestionEvent(sent_packets[ack.largest_acked].time_sent) CongestionEvent(sent_packets[ack.largest_acked].time_sent)
B.8. On Packets Lost B.8. On Packets Lost
Invoked from DetectLostPackets when packets are deemed lost. Invoked from DetectLostPackets when packets are deemed lost.
InPersistentCongestion(largest_lost_packet): InPersistentCongestion(lost_packets):
pto = smoothed_rtt + max(4 * rttvar, kGranularity) + pto = smoothed_rtt + max(4 * rttvar, kGranularity) +
max_ack_delay max_ack_delay
congestion_period = pto * kPersistentCongestionThreshold congestion_period = pto * kPersistentCongestionThreshold
// Determine if all packets in the time period before the // Determine if all packets in the time period before the
// newest lost packet, including the edges, are marked // largest newly lost packet, including the edges, are
// lost // marked lost
return AreAllPacketsLost(largest_lost_packet, return AreAllPacketsLost(lost_packets, congestion_period)
congestion_period)
OnPacketsLost(lost_packets): OnPacketsLost(lost_packets):
// Remove lost packets from bytes_in_flight. // Remove lost packets from bytes_in_flight.
for (lost_packet : lost_packets): for (lost_packet : lost_packets):
bytes_in_flight -= lost_packet.size bytes_in_flight -= lost_packet.size
largest_lost_packet = lost_packets.last() CongestionEvent(lost_packets.largest().time_sent)
CongestionEvent(largest_lost_packet.time_sent)
// Collapse congestion window if persistent congestion // Collapse congestion window if persistent congestion
if (InPersistentCongestion(largest_lost_packet)): if (InPersistentCongestion(lost_packets)):
congestion_window = kMinimumWindow congestion_window = kMinimumWindow
B.9. Upon dropping Initial or Handshake keys
When Initial or Handshake keys are discarded, packets from the space
are discarded and loss detection state is updated.
Pseudocode for OnPacketNumberSpaceDiscarded follows:
OnPacketNumberSpaceDiscarded(pn_space):
assert(pn_space != ApplicationData)
// Remove any unacknowledged packets from flight.
foreach packet in sent_packets[pn_space]:
if packet.in_flight
bytes_in_flight -= size
sent_packets[pn_space].clear()
// Reset the loss detection and PTO timer
time_of_last_sent_ack_eliciting_packet[kPacketNumberSpace] = 0
loss_time[pn_space] = 0
SetLossDetectionTimer()
Appendix C. Change Log Appendix C. Change Log
*RFC Editor's Note:* Please remove this section prior to *RFC Editor's Note:* Please remove this section prior to
publication of a final version of this document. publication of a final version of this document.
Issue and pull request numbers are listed with a leading octothorp. Issue and pull request numbers are listed with a leading octothorp.
C.1. Since draft-ietf-quic-recovery-26 C.1. Since draft-ietf-quic-recovery-26
No changes. No changes.
 End of changes. 104 change blocks. 
313 lines changed or deleted 426 lines changed or added

This html diff was produced by rfcdiff 1.44jr. The latest version is available from http://tools.ietf.org/tools/rfcdiff/