| Internet-Draft | SRv6 Precision Flow Control | March 2026 |
| Yang, et al. | Expires 2 September 2026 | [Page] |
This document defines a flow-level precision congestion control mechanism for SRv6 networks. The mechanism specifies new congestion notification message formats that enable per-flow congestion information delivery and hop-by-hop backpressure control. Compared to traditional Priority-based Flow Control (PFC) which operates at the queue level, this mechanism provides finer-grained congestion control suitable for Wide-Area Network (WAN) environments, mitigating head-of-line blocking, congestion spreading, and deadlock issues. The document also describes interoperability models with traditional IEEE 802.1Qbb PFC.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 September 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
With the exponential growth of intelligent computing services, scenarios such as distributed AI training, Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCEv2), and disaggregated storage-compute architectures require rigorous lossless transmission of large volumes of bursty traffic. As these services expand beyond data centers across Wide-Area Networks (WANs), maintaining zero-packet-loss guarantees becomes increasingly challenging.¶
Traditional Priority-based Flow Control (PFC), as defined in IEEE 802.1Qbb, is a Data Link Layer flow control mechanism primarily designed for intra-data center networks. When applied to WAN scenarios with higher Bandwidth-Delay Products (BDP), PFC faces severe structural limitations:¶
To address these limitations, this document proposes a Flow-Level Precision Congestion Control mechanism. Operating within SRv6 networks, it allows network nodes to uniquely identify congested IP flows and explicitly signal upstream nodes to enforce granular rate reduction or pause actions exclusively on the offending flows.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The mechanism operates within standard SRv6 data planes. To support Flow-Level Precision Congestion Control, participating routing nodes are REQUIRED to implement the following functional components:¶
Forwarding nodes MUST perform flow classification to distinguish traffic streams. The default classification method SHOULD utilize the IPv6 Flow Label (as defined in [RFC6437]) combined with the Source and Destination IPv6 Addresses.¶
Alternatively, nodes MAY utilize a classic 5-tuple identifier (Source IP, Destination IP, Protocol, Source Port, Destination Port) where payload inspection is feasible. Implementation-specific classifications (such as Deep Packet Inspection for Layer-7 headers or traffic behavioral heuristics) MAY be used but are strictly outside the scope of this standard.¶
Upon detecting a stateful flow, the node allocates a unique Stream ID.
The Stream ID management strategy can be localized (significant only between
two adjacent hops) or globally coordinated (e.g., using an SDN controller
across the SRv6 domain).¶
The lifecycle of precision congestion control is defined by the following state machine transitions:¶
Congestion Detection (Local State):¶
A node actively monitors its egress buffer occupancy for each identified
flow. When the instantaneous or average buffer depth for a specific
Stream ID exceeds a pre-configured high-water mark threshold,
the node transitions to the Congested state.¶
PFCM Generation (Signaling):¶
The congested node generates a Precision Flow Control Message (PFCM).
The PFCM encapsulates the offending Stream ID, the local
Queue ID, the requested Action (e.g., reduce rate
by 50%), and the Precision Flow Control Time.¶
Reverse Path Transmission:¶
The PFCM is transmitted to the directly connected upstream node from which the congested flow was received. The PFCM SHOULD be routed to the upstream neighbor's Link-Local IPv6 address.¶
Upstream Enforcement (Backpressure):¶
Upon reception of a PFCM, the upstream node parses the Stream ID
and maps it to its local forwarding state. It MUST immediately apply the
specified Action for the duration of the Precision Flow Control Time.
If the upstream node cannot absorb the backpressure locally, it MAY
recursively generate a new PFCM to its own upstream node.¶
Heterogeneous networks may contain legacy devices incapable of L3 per-flow control. To ensure seamless backward compatibility, a border node receiving a PFCM MAY translate the L3 signaling into an IEEE 802.1Qbb L2 PFC frame.¶
In such translation operations:¶
Precision flow control telemetry MAY be carried in an IPv6 Hop-by-Hop Options header or Destination Options header ([RFC8200]). This is highly optimal for in-band telemetry or when piggybacked on reverse-path traffic.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Type | Opt Data Len | Type | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stream ID | Queue ID | Action | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Precision Flow Ctrl Time | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | Destination IPv6 Address | + (Original Congested Packet) + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | Source IPv6 Address | + (Original Congested Packet) + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The fields are defined as follows:¶
00 = No Backpressure,
01 = Pause Flow, 10 = Reduce Rate.
Bits [2:7] represent the rate reduction ratio as an absolute percentage
(0-100) when the action type is 10.¶
Out-of-band signaling utilizes ICMPv6 messages. This mechanism guarantees delivery independent of reverse-path data traffic availability.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Stream ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Queue ID | Action | Precision Flow Ctrl Time | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | Destination IPv6 Address | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | Source IPv6 Address | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The ICMPv6 header fields are strictly defined as:¶
Precision Flow Control Notification.¶
The introduction of L3/L4 flow-level pause and backpressure signaling inherently expands the attack surface of the network architecture. Malicious actors could spoof PFCM packets to arbitrarily pause critical infrastructure flows, leading to a severe Denial of Service (DoS) attack.¶
To mitigate these threats, the following security constraints MUST be enforced by compliant implementations:¶
Hop Limit Verification:¶
When processing an ICMPv6 PFCM, a node MUST verify that the IP Hop Limit is exactly 255. Packets arriving with a smaller Hop Limit MUST be silently discarded, guaranteeing that the signal originated from an immediate neighbor.¶
Cryptographic Authentication:¶
In untrusted or multi-tenant transport domains, the precision flow control messages SHOULD be secured using the IPsec Authentication Header (AH) or Encapsulating Security Payload (ESP) to ensure data integrity and neighbor origin authentication.¶
Rate Limiting:¶
Nodes MUST implement strict control-plane policing (CoPP) and rate limiting for PFCM processing to prevent CPU resource exhaustion attacks.¶
This document requests the following allocations from IANA:¶
The authors would like to thank the contributors and reviewers who provided valuable feedback on this document.¶