Internet-Draft | Microloop Prevention | October 2024 |
Yuan, et al. | Expires 14 April 2025 | [Page] |
Considering computing and networking is quite different in terms of resource granularity as well as their status stability, a hierarchical segment routing is proposed and introduced as an end-to-end CATS process. However, it brings about potential problems as illustrated in [I-D.yuan-cats-end-to-end-problem-requirement]. In order to solve the mentioned problems and to improve and perfect a hierarchical solution, corresponding aggregation methods are discussed and hierarchical entries are proposed in this draft.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 14 April 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
An end-to-end CATS process is described as a hierarchical two segment routing manner in [I-D.ldbc-cats-framework] and [I-D.huang-cats-two-segment-routing] and it is further analyzed in [I-D.yuan-cats-end-to-end-problem-requirement] which implys that a hierarchical segment routing solution requires incremental solutions and designs.¶
Compared to non hierarchical routing methods, a hierarchical segment solution has its unique features and proposes additional requirements as follows:¶
Aggregate explicit and detailed information of multiple service instances appropriately to avoid a loop caused by improper routing and forwarding decisions led by inadequate cohesive information.¶
Solve the microloop problem occurred in hierarchical segment routing under a multi-point decision-making circumstance due to inconsistent convergence time.¶
In IP networks, due to the distributed LSDB of IGP, there might be microloop issues when IGP converges out of order. Solutions has proposed such as Order FIB and Order Metric, but due to their principles of controlling the convergence order of network devices to guarantee orderly convergence, the convergence process becomes much more complex and the convergence time increases significantly. Thus, these schemes have not been widely applied and deployed in networks. Currently, SR technology is commonly used to address microloop issues, such as constructing an acyclic SRv6 Segment List to eliminate loops.¶
However, an explicit destination is determined since the source device in IP routing circumstances while a specific service instance may not be designated during the first segment in a hierarchical segment service routing process. There is a lack of connection between forwarding behaviors on multiple devices. Thus, a conventional SR based solution requires incremental designs.¶
Therefore, this draft proposes possible aggregation methods, designs hierarchical entries including global bases and local bases and introduces a forwarding behaviour with Computing Segments.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
LSDB: Link State DataBase¶
IGP: Interior Gateway Protocol¶
RIB: Routing Information Base¶
FIB: Forwarding Information Base¶
SR: Segment Routing¶
SRv6: Segment Routing over IPv6¶
GRIB: Global Routing Information Base¶
GFIB: Global Forwarding Information Base¶
LRIB: Local Routing Information Base¶
LFIB: Local Forwarding Information Base¶
Assume that for a computing related service, a Service ID is utilized as an identifier as proposed in [I-D.ma-intarea-identification-header-of-san]. Various computing related services may be sensitive to different attributes among metadata of computing resources and network capabilities, such as CPU cores, CPU load, I/O, memory, delay, bandwidth, etc.¶
For a service instance which is able to provide a computing related service, metadata of sensitive attributes collected is capable of indicating the performance at the moment. Furthermore, as illustrated in [I-D.yuan-cats-middle-ware-facility], a Metric value can be calculated with the metadata of sensitive attributes as input variables.¶
Therefore, the aggregation of computing resource status information is divided into the following two categories.¶
1. Aggregation of Metric Values.¶
As shown below, a set of meta information been sensitive to a computing related service is recorded as a Attribute Set, the Attribute Set collected at Instance 1 for Service ID 1 is Attribute Set 1,1 for instance.¶
There are multiple instances located in a edge cloud or a central data center connected to corresponding PEs. Based on the respective metadata of dynamic computing status and network conditions meta information of these service instances at a certain time, corresponding Metric values representing their capabilities can be calculated. As shown below, Instance 1, 2 and 3 located at PE 1 are all able to provide services represented by Service ID 1, 2 and 3. Accordingly, Metric 1,1 to Metric 3,3 are calculated respectively.¶
Based on a framework of hierarchical computing status awareness and segment service routing, edge devices apply a corresponding aggregation algorithm to these Metric values, and publish and notify the aggregated results to the network. For a computing-related service represented by a Service ID, aggregation algorithms include but are not limited to:¶
Take the average of the corresponding Metric values calculated for a specific service among all service instances.¶
Take the weighted average of the corresponding Metric values calculated for a specific service among all service instances.¶
Take the maximum of the corresponding Metric value calculated for a specific service among all service instances.¶
Take the minimum of the corresponding Metric value calculated for a specific service among all service instances.¶
Take the median of the corresponding Metric values calculated for a specific service among all service instances.¶
The differential evaluation methods of each service can degenerate into a unified or service class based evaluation method based on conditions. Specifically here, different Service IDs can also correspond to a same Metric value calculated by a unified evaluation algorithm or a set of Service IDs corresponds to one Metric.¶
2. Aggregation of Metadata Sets.¶
The other aggregation method is shown above. For instance 1 located at PE 1, the Attribute Set of Service ID 1 to Service ID 3 are Attribute Set 1 to Attribute Set 3 respectively. The union of meta information in these Attribute Sets of multiple computing related services is recorded as the Metadata Set of instance 1. Similar to the process of aggregating Metric values, edge devices can also aggregate multiple Metadata Sets into a Cohesive Metadata Set, and then publish and notify the aggregated results to the network. For all service instances at an edge device, the aggregation algorithm includes but is not limited to:¶
Take the average value of same elements in Metadata Sets as the corresponding element value of the Cohesive Metadata Set.¶
Take the weighted average of same elements in Metadata Sets as the corresponding element value of the Cohesive Metadata Set.¶
Take the maximum value of same elements in Metadata Sets as the corresponding element value of the Cohesive Metadata Set.¶
Take the minimum value of same elements in Metadata Sets as the corresponding element value of the Cohesive Metadata Set.¶
Take the median value of same elements in Metadata Sets as the corresponding element value of the Cohesive Metadata Set.¶
Apply a corresponding strategy to select an instance and directly use the Metadata Set of the selected instance as the Cohesive metadata set.¶
In conclusion, a PE can aggregate multiple Metric values or Metadata Sets and further publish and advertise the coarse granularity and relatively stable information to the network. As analyzed in [I-D.yuan-cats-end-to-end-problem-requirement], an aggregation result should maintain its comparability to the information of any explicit service instance.¶
With the application of appropriate aggregation functions, the exposed entries gain essential correctness. However, due to the indeterminacy of forwarding behaviours and inseparable entries, a microloop problem still occurs under circumstances of sudden failures or dynamic updates. Therefore, a design of hierarchical entries is proposed in this section.¶
Taking an aggregation of Metric values as an example, Metric values of several service instances at an edge device PE are aggregated and published and advertised in the network. By collecting and exchanging entries, a Global RIB was constructed. On the other hand, PE generates a Local RIB by collecting the running status of its local service instances. Afterwards, scheduling strategies are applied in the control plane and corresponding decisions are made. Suppose the entry with the smallest Metric value is selected as the optimal entry, it will further be distributed to the forwarding plane on the device and a Global FIB and a Local FIB is generated respectively, ultimately instructing the packet forwarding process.¶
A typical form of GRIB, GFIB, LRIB and LFIB, taking PE 4 as an example, is displayed as follows. To be noted, the computing status of PE 4 itself is also displayed as an entry in the GRIB with an aggregated manner.¶
Although a design of hierarchical entries separates entries with information of different granularity, global and local entries both further require to be correlated with packet features and defined forwarding behaviours.¶
As defined in [I-D.zhou-intarea-computing-segment-san], a Computing Segment is introduced. With the introduction of hierarchical entries displayed in the previous sections, a Computing Segment END.C can be further assorted as END.CG and END.CL associated with GFIB and LFIB respectively. Except for the different entries to lookup, END.CG and END.CL have identical semantics as stated in the previous work. The form of a packet delivered in the forwarding process is also shown below.¶
Referring to [I-D.fu-cats-muti-dp-solution], with an anycast service IP implementation for CATS, the Computing Segment would be simply configured as END.DX SID correlated to specific interfaces or tunnels which would correspondingly steer the traffic. With multiple next hops, multiple END.DX SIDs could be distinguished even they might be correlated with a same interface.¶
As shown above, END.CG(PE 1) and END.CL(PE 4) are Computing Segments configured at PE 1 and PE 4 respectively. END.CG(PE 1) correlates with GFIB at PE 1 while END.CL(PE 4) correlates with LFIB at PE 4. The forwarding process is determined by the SIDs and corresponding FIBs.¶
A microloop problem is displayed as follows with a circumstance of non hierarchical entries. A minimum Metric value is taken as the aggregated Metric value published by a PE. Instance 4,4 located at PE 4 is considered to be the most appropriate service instance with the minimum Metric value which represents the best performance when a service request accesses from PE 1. With a hierarchical computing status awareness and routing scheme, the traffic is first steered to PE 4 and then a sudden failure happens at Instance 4,4. PE 4 discovers the invalidity of Instance 4,4 and distributes a new FIB entry by recalculating in the control plane. PE 1 is selected as the updated next hop determined by PE 4. Thus, the traffic is steered back to PE 1. However, the event of the invalidity of Instance 4,4 has not been notified to the remote PE 1. Therefore, a microloop exists before PE 1 updates its entries.¶
An identical condition with introduction of hierarchical entries is analyzed below. A global choice is made at the access device which is PE 1 indicated by a END.CG SID. Then, PE 4 is selected as the most appropriate next hop with the minimum aggregated Metric value. Identically, a sudden failure happens at Instance 4,4 and PE 1 has not been notified yet. Unlike the previous mentioned conditions, a specific local behaviour to lookup the LFIB denoted by a END.CL SID is implemented at PE 4. Although Instance 4,4 becomes invalid, Instance 4,1 is determined as the substitution with a suboptimal Metric value. Contrary to early analysis, the traffic is not steered back between PEs. Thus, a microloop problem is prevented through the design of hierarchical entries and the introduction of Computing Segments.¶
TBA.¶
TBA.¶
TBA.¶