IDR Working Group P. Huo Internet-Draft G. Chen Intended status: Standards Track ByteDance Expires: April 28, 2025 C. Lin New H3C Technologies W. Cheng China Mobile Syed Hasan Raza Naqvi Broadcom Yossi Kikozashvili DriveNets C. Q ByteDance October 21, 2024 Bgp Extension for Tunnel Egress Point draft-hcl-idr-extend-tunnel-egress-point-03 Abstract In AI networks, flow characteristics often exhibit a low number of flows but with high bandwidth per flow, making it easy to cause network congestion when using traditional flow-level load balancing methods. Currently, the direction of traffic scheduling focuses on load sharing individual packets of the same flow, which requires sorting based on the Tunnel Egress Point information from the remote end. This document describes the method of publishing Tunnel Egress Point through the BGP protocol. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 28, 2025. hcl, et al. Expires April 28, 2025 [Page 1] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction...................................................3 1.1. Requirements Language.....................................3 2. Motivation.....................................................3 3. Terminology....................................................5 4. Solution.......................................................5 5. Protocol Extension.............................................7 5.1. Extend for TEP(Tunnel Egress Point).......................7 5.2. Extend for Encap ID.......................................9 5.3. Implementation based on different types of networks......10 6. Procedure.....................................................11 6.1. Procedure for IPv4/IPv6..................................11 6.2. Procedure for EVPN.......................................13 6.2.1. L2 Forwarding.......................................13 6.2.2. L3 Forwarding.......................................15 7. Deployment consideration......................................17 8. IANA Considerations...........................................17 9. Security Considerations.......................................18 10. References...................................................18 10.1. Normative References....................................18 10.2. Informative References..................................18 Acknowledgments..................................................19 Contributors.....................................................19 Authors' Addresses...............................................20 hcl, et al. Expires April 28, 2025 [Page 2] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 1. Introduction With the widespread application of AI technology, the AI Computing Center has experienced rapid development and increased attention to potential issues within AI networks. The characteristics of AI traffic exhibit a low number of flows with substantial bandwidth per flow, making traditional flow-level load balancing highly susceptible to multiple flows hashing to the same link, resulting in congestion on certain links while others remain idle. This leads to low network utilization and an inability to handle sudden surges in network traffic. Consequently, the need for a new load balancing scheduling model is imperative. Presently, the direction of scheduling in AI networks involves sharing the load of multiple packets within each flow individually, enabling the "spraying" of individual flows across the entire path to enhance effective bandwidth utilization and better application of existing bandwidth. However, sharing the load of individual packets within a flow can result in packet reordering for the same traffic. Therefore, it is necessary for the egress point to carry the egress features of the traffic to the ingress point, enabling packet sorting based on the egress features of the traffic to ensure the proper sequencing of multiple packets within the same flow. This document describes the method of conveying the egress characteristics of routes as route attributes through the BGP protocol to inform the ingress server. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT","SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Motivation As shown in the figure 1, Leaf devices are connected downwards to host devices and upwards to Spine devices. When hosts communicate with each other, there are multiple different ECMP paths available for OSF-Egress to forward packets. For example, traffic from H1 to H8 can go through the path H1 -> Leaf1 -> Spine1 -> Leaf4 -> H8, or it can go through H1 -> Leaf1 -> Spine2 -> Leaf4 hcl, et al. Expires April 28, 2025 [Page 3] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 -> H8. In traditional load balancing, after hashing the traffic, the same path is chosen for forwarding for the same flow. In AI networks, where there is less data per flow but each flow carries a larger payload, traditional load balancing strategies can lead to network congestion. To adapt to the characteristics of AI networks, when load balancing with ECMP, multiple small data packets can be combined into a larger packet for transmission, and large packets can be divided into relatively smaller packets for transmission. The combined data packets are then evenly distributed over ECMP paths to fully utilize the bandwidth of each path. However, this may result in packet reordering, so it is necessary to reorder the packets at the packet's destination. During sorting, all packets destined for the same end-point need to be sorted. For example, for two data packets from H1 to H8, they are sorted based on the destination (Leaf4 + H8) to ensure that the packets arrive at H8 in the correct order. Therefore, it is necessary to synchronize end-point information from OSF-Egress to OSF-Ingress through the control plane. When sending packets, OSF-Ingress numbers the packets based on the end-point and selects different paths for "spraying" the packets. The intermediate OSF-Forward device forwards packets towards the final destination device based on the end-point without concerning about packet order. Finally, OFP-Egress reorders packets based on the same end-point number and forwards them to the hosts. This document primarily describes how the control plane delivers end-point information. hcl, et al. Expires April 28, 2025 [Page 4] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 +-------------+ +-------------+ | | | | |Spine1 | |Spine2 | +--+--+--+--+-+ +-+--+---+--+-+ / | | |1 / / | \2 / | | \ / / + \ / +--(--(----(----------+ / / \ / / | | +-(-----------+ / \ / / | \ | +---------------)---+ \ ++ / ++ +-)---------+ / \ \ / /2 \ | \ / \ \ +-+--+--+ +-+---+-+ +-+---+-+ +-+------+-+ | | | | | | | | |Leaf1 | |Leaf2 | |Leaf3 | |Leaf4 | +-+---+-+ +-+---+-+ +-+---+-+ +-+-----+--+ | | | | | | | |1,2 H1 H2 H3 H4 H5 H6 H7 H8 Figure 1: AI network 3. Terminology The following terminologies are used in this document. TEP: Tunnel Egress Point. This document OSF: Open Scheduled Fabric. [draft-hcl-rtgwg-osf-framework-00] OSF-Ingress: OSF Ingress router. [draft-hcl-rtgwg-osf-framework-00] OSF-Egress: OSF Egress router. [draft-hcl-rtgwg-osf-framework-00] OSF-Forwarder: OSF Forwarder Router. [draft-hcl-rtgwg-osf-framework- 00] 4. Solution As shown in Figure 2, in the Spin/Leaf network, each Leaf device, when advertising route prefixes externally, includes the Tunnel Egress Point information corresponding to these route prefixes. When the entry Leaf device receives this route, it extracts the Tunnel Egress Point information and forwards it to the forwarding layer. The specific usage of the Tunnel Egress Point by the forwarding layer is beyond the scope of this document. hcl, et al. Expires April 28, 2025 [Page 5] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 +---------------------------------------------------------------------------+ | +-----------+ +-----------+ | | |Spin1 | |Spin2 | | | +-+-+-+-+---+ +-+-+-+-+---+ | | | | | | +----------------+------------------+--------------+ | | | | | | | +------)-------+--------)---------+--------)------+ | |. | | | | | | | | | | +-+------+--+ +-+--------+-+ +-+--------+-+ +-+-------+-+ | | |Leaf1 | |Leaf2 | |Leaf3 | |Leaf4 | | | +-+---------+ +--+---------+ +---+--------+ +---+-------+ | | | TEP1 | TEP2 | TEP3 | TEP4 | +---------)---------------)-------------------)---------------)-------------+ | | | | P1 P2 P3 P4 Figure 2: Spin/Leaf network The forwarding paths for traffic are illustrated in Figure 3. For the same traffic from Leaf1 to Leaf2, there are two possible paths: Spin1->Leaf2 and Spin2->Leaf2. Different paths for the same traffic have the same Tunnel Egress Point information. +---------+ Leaf1: |P2 |----+ Spin1---Leaf2 TEP2 +---------+ + Spin2---Leaf2 TEP2 Figure 3: Illustration of Multiple Forwarding Paths In addition to path information, to enable Leaf2 devices to directly forward packets without the need for secondary table lookups, Leaf1 devices can also prepare the required encapsulation information in advance. The encapsulation information is identified by an Encap ID and is included with the route when the Leaf device publishes it. Other devices, when forwarding packets, will include the Encap ID information if the route publisher has provided it. hcl, et al. Expires April 28, 2025 [Page 6] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 +---------+ Leaf1: |P2 |----+ Spin1---Leaf2 TEP2,Encap ID1 +---------+ + Spin2---Leaf2 TEP2,Encap ID1 Figure 4 The specific synchronization process is as follows: 1) When Leaf2 devices announce routing information externally, they carry TEP2 information. 2) When Leaf2 devices announce the encapsulation information Encap ID1 to reach P2 externally. 3) When Leaf1 devices forward packets, they specify the forwarding path and the destination information TEP2. At the same time, based on the destination address P2, they specify the final encapsulation information Encap ID1 for sending. 4) The intermediate device independently determines the path to TEP2 and forwards the packet to TEP2. 5) TEP2, as the last hop router, directly encapsulates the packet according to the Encap ID1 and delivers the packet to P2. 5. Protocol Extension This section introduces the method of extending the BGP protocol to carry the Tunnel Egress Point information within the community attribute. The Tunnel Egress Point information includes the Device Index and Port Index. The Device Index is globally unique and is used to distinguish different Leaf devices, while the Port Index is unique to the local device and is used to differentiate between different interfaces on the local device. 5.1. Extend for TEP(Tunnel Egress Point) the TEP attribute is advertised as the path attribute type for BGP routes. Add a new type, TEP type, to "BGP Tunnel Encapsulation Attribute Tunnel Types." The TEP attribute is an optional transitive BGP path attribute. hcl, et al. Expires April 28, 2025 [Page 7] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tunnel Type(2 octets) | Length(2 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Tunnel Egress Point attribute Tunnel Type: TBD, 2 octets. Identifies a type of tunnel. The field contains values from the IANA registry "BGP Tunnel Encapsulation Attribute Tunnel Types" [IANA-BGP-TUNNEL-ENCAP] [RFC9012] Length: 2 octets, length of Value Currently, two types of TEPs have been defined: one that carries a one Device ID and one Port ID attribute, and another that carries one Device ID multiple DevicePort attributes. When the destination address is a unicast address, the corresponding destination node is a single node, and it carries a single DevicePort attribute. When the destination address is a broadcast address, the corresponding destination node is a group of nodes, and it carries one DeviceID and multiple PortID attributes. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TEP Type | Length(1 octets)| Resv | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Device ID (4 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port ID (4 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: Single Port Index attribute hcl, et al. Expires April 28, 2025 [Page 8] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 TEP Type 1: TBD1, Single Port Index, 1 octet Length: 8, length of one DevicePort, 1 octet Resv: 2 octets Device ID: The Device ID, 4 octets Port ID: The port ID, 4 octets 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TEP Type | Length(2 octets) | Resv | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Device ID (4 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port ID1 (4 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port IDn (4 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Multiple Port Index attribute TEP Type 2: TBD2, Multiple Port Index Resv: 1 octet Length: DevicePort Total length Device ID: The Device ID, 4 octets Port ID: The port ID, 4 octets, can carry one or more, at least one. 5.2. Extend for Encap ID The Encap ID occupies 2 bytes and is currently used only in EVPN networks. To facilitate the packaging of routes with the same hcl, et al. Expires April 28, 2025 [Page 9] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 attributes in BGP, the implementation includes the Encap ID as part of the NLRI information in EVPN routes. It reuses the Mpls ID2 field within the EVPN NLRI, eliminating the need for additional extensions, as shown below. +---------------------------------------+ | RD (8 octets) | +---------------------------------------+ |Ethernet Segment Identifier (10 octets)| +---------------------------------------+ | Ethernet Tag ID (4 octets) | +---------------------------------------+ | MAC Address Length (1 octet) | +---------------------------------------+ | MAC Address (6 octets) | +---------------------------------------+ | IP Address Length (1 octet) | +---------------------------------------+ | IP Address (0, 4, or 16 octets) | +---------------------------------------+ | MPLS Label1 (3 octets) | +---------------------------------------+ | MPLS Label2 (0 or 3 octets) | | /Encap ID attribute (0 or 3 octets) | +---------------------------------------+ Figure 8 5.3. Implementation based on different types of networks For the network shown in Figure 1, it can be a regular Layer 3 IP network or a Layer 2 network based on EVPN. When advertising network route information, extended TEP attribute information is carried as path attribute. If it is an EVPN network, the Encap ID is advertised with Type-2 MAC routes. When Leaf1 forwards a packet to a host under Leaf2's P2, it first retrieves the TEP information based on the P2 route and then obtains the Encap ID information based on the host information. During packet encapsulation, both the TEP and Encap ID information are included in the packet sent to Leaf2. The support for BGP Multicast VPN (MVPN) Services [RFC6513] with Tunnel Egress Point is outside the scope of this document. hcl, et al. Expires April 28, 2025 [Page 10] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 6. Procedure 6.1. Procedure for IPv4/IPv6 When the control plane uses IPv4 or IPv6 unicast address families, the data plane does not require additional encapsulation extensions, except for sorting. The control plane only needs to add similar extensions like 5.1. The specific handling of the control plane and data plane is as follows. Control Layer: 1) When OFP-Egress advertises IPv4/IPv6 prefix routes externally, the TEP attributes serve as path attribute types for these routes. For specific extension formats, refer to sections 5.1. 2) Upon receiving the prefix routes, OFP-Ingress updates the destination address and TEP information into the L3 forwarding table. Forwarding Layer(details are not included in this document, the following is just the processing logic): 1) The L3 forwarding table records the TEP information. 2) During packet forwarding, OFP-Ingress sequences packets based on TEP information and embeds the TEP information and packet sequence number in the forwarded packet. How the TEP information and sequence number are carried within the forwarded packets is beyond the scope of this document. 3) OFP-Forward devices can choose to forward based on IP/IPv6 addresses or based on TEP info, without regard to packet disarray during forwarding. 4) Packets forwarded to OFP-Egress may arrive out of order due to differing intermediate paths. 5) OFP-Egress receives the packets, sorts them according to their sequence numbers, and if necessary, reassembles them, and forwards the packets to the server the original order sent by OFP-Ingress. 6) When delivering packets to the server, OFP-Egress adds the necessary encapsulation, and then delivers them to the server. The information that the control plane's OFP-Egress sends to the OFP-Ingress is shown in Figure 9. hcl, et al. Expires April 28, 2025 [Page 11] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 <-- Advertise TEP and EncapID in Route attrib to Ingress +--- Server1(10.1.1.1) DevID1 /TEP: (DevID1, PortID1) +-----------+/ +--|OFP-Egress1|\ / +-----------+ +--- Server2 (20.1.1.1) +-------------+ / TEP: (DevID1, PortID2) |OFP-Ingress |/ +-------------+\ \ +--- Server3(30.1.1.1) \ DevID2 /TEP: (DevID2, PortID1) \ +-----------+/ +--|OFP-Egress2|\ +-----------+ \ +--- Server4(40.1.1.1) TEP: (DevID2, Port2) ... \ DevIDn +----Servern1(xx.1.1.1) \ +-----------+/TEP: (DevIDn, Port1) +---|OFP-Egressn| EncapID:1 +-----------+\ +---- Servern2(yy.1.1.1) TEP: (DevIDn, Port2) Figure 9 The data maintained by the OFP-Ingress is shown in Figure 10. +=========+===============+ |Prefix | TEP | +=========+===============+ |10.1.1.1 |DevID1,PortID1 | +---------+---------------+ |20.1.1.1 |DevID1,PortID2 | +---------+---------------+ |30.1.1.1 |DevID2,PortID1 | +---------+---------------+ |40.1.1.1 |DevID2,PortID2 | +---------+---------------+ |xx.1.1.1 |DevIDn,PortID1 | +---------+---------------+ |yy.1.1.1 |DevIDn,PortID1 | +---------+---------------+ hcl, et al. Expires April 28, 2025 [Page 12] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 Figure 10 The process of packet sending and reordering in the forwarding layer is shown in Figure 11. Here, p1, p2, and p3 are the three packets sent from OFP-Ingress to OFP-Egress. After being forwarded through multiple ECMP paths, they arrive at the OFP-Egress in the order p3, p2, p1. The OFP-Egress then reorders them based on the SequenceID, restoring the order to p1, p2, p3 before delivering them to the Server. TEP TEP SequenceID SequenceID EncapID +------------+ EncapID +--------+-->p1 | ECMP1 +- +-->p3 +---------+-->p1(encap) |OSF- | ============== \ / |OSF- | | +-->p2 | ECMP2 +------->p2 | +-->p2(encap) |Ingress | ============== /\ |Egress | | +-->p3 | ... +- +--->p1 | +-->p3(encap) +--------+ ============== +---------+ | ECMPn | +------------+ Figure 11 6.2. Procedure for EVPN 6.2.1. L2 Forwarding Control Layer: 1) When OFP-Egress advertises Type 2 routes (MAC/IP advertisement) externally, it carries the TEP and Encap ID in the NLRI. For example, OFP-Egress1 advertise Type 2 route with MAC address (MAC1), IP Address (10.1.1.1), EncapID (1), TEP (DevID1, PortID1). For detailed formatting, see Figure 12. 2) Upon receiving the EVPN Type 2 routes, OFP-Ingress updates the destination address, TEP information, and EncapID into the L2 forwarding table. Forwarding Layer: 1) The L2 forwarding table records the TEP and EncapID information. 2) During packet forwarding, OFP-Ingress sequences packets based on TEP information, embedding the TEP info and packet sequence number hcl, et al. Expires April 28, 2025 [Page 13] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 in the forwarded packet. Additionally, it encapsulates the EncapID information within the packet. How the TEP information, sequence number, and EncapID are carried within forwarded packets is beyond the scope of this document. 3) OFP-Forward devices can choose to forward based on MAC or IP addresses, or based on TEP attributes, without regard to packet disarray during forwarding. 4) Packets forwarded to OFP-Egress may arrive out of order due to differing intermediate paths. 5) OFP-Egress receives the packets, sorts them according to their sequence numbers, and if necessary, reassembles them, and forwards the packets to the server the original order sent by OFP-Ingress. 6) When delivering packets to the server, OFP-Egress converts the packets according to EncapID information into local encapsulation formats, it then adds a L2-layer encapsulation to the packets based on the EncapID and forwards them to the server in sequence. hcl, et al. Expires April 28, 2025 [Page 14] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 <-- Advertise TEP and EncapID in Route attrib to Ingress +--- Server1(10.1.1.1, MAC1) DevID1 /TEP: (DevID1, PortID1) +-----------+/ EncapID: 1 +--|OFP-Egress1|\ / +-----------+ +--- Server2 (20.1.1.1, MAC2) +-------------+ / TEP: (DevID1, PortID2) |OFP-Ingress |/ EncapID: 2 +-------------+\ \ +--- Server3(30.1.1.1, MAC3) \ DevID2 /TEP: (DevID2, PortID1) \ +-----------+/ EncapID: 1 +--|OFP-Egress2|\ +-----------+ \ +--- Server4(40.1.1.1,MAC4) TEP: (DevID2, Port2) EncapID: 2 ... \ DevIDn +----Servern1(xx.1.1.1,MACn1) \ +-----------+/TEP: (DevIDn, Port1) +---|OFP-Egressn| EncapID: 1 +-----------+\ +---- Servern2(yy.1.1.1,MACn2) TEP: (DevIDn, Port2) EncapID: 2 Figure 12 6.2.2. L3 Forwarding Control Layer: 1) When OFP-Egress advertises Type 2 routes (MAC/IP advertisement) externally, it carries the TEP and Encap ID in the NLRI. When OFP- Egress advertises Type 5 routes (IP prefix), it carries TEP information. For example, OFP-Egress1 advertise Type 2 route with MAC address (MAC1), IP Address (10.1.1.1), EncapID (1). And advertise Type 5 route with IP address (100.1.1.0/24), GW IP address (10.1.1.1), TEP (DevID1, PortID1). 2) Upon receiving the EVPN Type 2 routes, OFP-Ingress updates the destination address, TEP information, and EncapID into the L2 forwarding table. 3) Upon receiving the EVPN Type 5 routes, OFP-Ingress looks up the corresponding Type 2 route using the Type 5 route's GW IP address and inherits the Type 2 route's EncapID. It then updates the hcl, et al. Expires April 28, 2025 [Page 15] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 destination address, TEP information, and EncapID into the L3 forwarding table. Forwarding Layer: 1) The L3 forwarding table records the TEP and EncapID information. 2) During packet forwarding, OFP-Ingress sequences packets based on TEP information, embedding the TEP info and packet sequence number in the forwarded packet. Additionally, it encapsulates the EncapID information within the packet. How the TEP information, sequence number, and EncapID are carried within forwarded packets is beyond the scope of this document. 3) OFP-Forward devices can choose to forward based on MAC or IP addresses, or based on TEP attributes, without regard to packet disarray during forwarding. 4) Packets forwarded to OFP-Egress may arrive out of order due to differing intermediate paths. 5) OFP-Egress receives the packets, sorts them according to their sequence numbers, and if necessary, reassembles them, and forwards the packets to the server the original order sent by OFP-Ingress. 6) When delivering packets to the server, OFP-Egress converts the packets according to EncapID information into local encapsulation formats, it then adds a L2-layer encapsulation to the packets based on the EncapID and forwards them to the server in sequence. hcl, et al. Expires April 28, 2025 [Page 16] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 <-- Advertise TEP and EncapID in Route attrib to Ingress +--- Ext-Route1: 100.1.1.0/24 /GateWay1(10.1.1.1, MAC1) DevID1 /TEP: (DevID1, PortID1) +-----------+/ EncapID: 1 +--|OFP-Egress1|\ / +-----------+ +--- Ext-Route2:100.2.1.0/24 / GateWay2(20.1.1.1,MAC2) +-------------+ / TEP: (DevID1, PortID2) |OFP-Ingress |/ EncapID: 2 +-------------+\ \ +--- Ext-Route3: 100.3.1.0/24 \ / GateWay3(30.1.1.1, MAC3) \ DevID2 /TEP: (DevID2, PortID1) \ +-----------+/ EncapID: 1 +--|OFP-Egress2|\ +-----------+ \ +--- Ext-Route4: 100.4.1.0/24 GateWay4(40.1.1.1,MAC4) TEP: (DevID2, Port2) EncapID: 2 ... \ DevIDn +----Ext-Routen1(100.xx.1.0/24) /GateWayn1(xx.1.1.1,MACn1) \ +-----------+/TEP: (DevIDn, Port1) +---|OFP-Egressn| EncapID: 1 +-----------+\ +----Ext-Routen2(100.yy.1.0/24) GateWayn2(yy.1.1.1,MACn2) TEP: (DevIDn, Port2) EncapID: 2 Figure 13 7. Deployment consideration The Device ID of each Spin device must be globally unique, which can be ensured through configuration or by uniformly distributing guarantees through the controller. 8. IANA Considerations This document registers the following in the "BGP Tunnel Encapsulation Attribute Tunnel Types" registry.[RFC9012] TBD Tunnel Egress Point attribute hcl, et al. Expires April 28, 2025 [Page 17] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 9. Security Considerations TBD 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017 [RFC9012] K. Patel, "The BGP Tunnel Encapsulation Attribute", ISSN: 2070-1721, April 2021, . 10.2. Informative References TBD. hcl, et al. Expires April 28, 2025 [Page 18] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 Acknowledgments TBD Contributors Jia Li New H3C Technologies China Email: lij@h3c.com Meng Li New H3C Technologies China Email: li_meng_limeng@h3c.com Jian Chen New H3C Technologies China Email: jian_chen@h3c.com Haina Zhong New H3C Technologies China Email: zhonghaina.06454@h3c.com Jincan Li RuiJie China Email: lijincan@ruijie.com.cn Yanrong Liang RuiJie China Email: liangyanrong@ruijie.com.cn Daniel Roytenberg DriveNets Email: danielro@drivenets.com Eyal Hezi DriveNets Email: ehezi@drivenets.com Alvin Yu Zhang DriveNets Email: azhang@drivenets.com hcl, et al. Expires April 28, 2025 [Page 19] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 Yehonatan Lemberger DriveNets Email: ylemberger@drivenets.com Yanjun Yang Broadcom Email: Yanjun.yang@broadcom.com Authors' Addresses PengFei Huo ByteDance China Email: huopengfei@bytedance.com Gang Chen ByteDance China Email: chengang.gary@bytedance.com hcl, et al. Expires April 28, 2025 [Page 20] Internet-Draft Bgp Extension for Tunnel Egress Point October 2024 Changwang Lin New H3C Technologies China Email: linchangwang.04414@h3c.com Weiqiang Cheng China Mobile china Email: chengweiqiang@chinamobile.com Syed Hasan Raza Naqvi Broadcom Email: syed.naqvi@broadcom.com Yossi Kikozashvili DriveNets Email: ykikozashvili@drivenets.com Chenchen Qi ByteDance China Email: qichenchen@bytedance.com hcl, et al. Expires April 28, 2025 [Page 21]