<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="4"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc strict="no"?>
<?rfc rfcedstyle="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-bess-bgp-multicast-controller-17" ipr="trust200902">
  <front>
    <title abbrev="bgp-mcast-controller">Controller-based BGP Multicast Signaling</title>

    <author fullname="Zhaohui Zhang" initials="Z." surname="Zhang">
      <organization>HPE</organization>
      <address>
        <email>zhaohui.zhang@hpe.com</email>
      </address>
    </author>

<author fullname='Robert Raszuk' initials='R' surname='Raszuk'>
    <organization>Arrcus</organization>
    <address>
        <postal>
            <street>2077 Gateway Place</street>
            <city>San Jose</city>
            <region>CA</region>
            <code>95110</code>
            <country>USA</country>
        </postal>
        <email>robert@raszuk.net</email>
    </address>
</author>

    <author fullname="Dante Pacella" initials="D." surname="Pacella">
      <organization>Verizon</organization>
      <address>
        <email>dante.j.pacella@verizon.com</email>
      </address>
    </author>

    <author fullname="Arkadiy Gulko" initials="A." surname="Gulko">
      <organization>Edward Jones Wealth Management</organization>
      <address>
        <email>arkadiy.gulko@edwardjones.com</email>
      </address>
    </author>

    <workgroup>BESS</workgroup>

    <abstract>
      <t>This document specifies a way that one or more centralized
         controllers can use BGP to set up multicast distribution trees
         (identified by either IP source/destination address pair, or
		 mLDP FEC<!--, or SR-P2MP Tree-ID-->) in a network. Since the controllers
         calculate the trees, they can use sophisticated algorithms
         and constraints to achieve traffic engineering. The controllers
		 directly signal dynamic replication state to tree nodes,
		 leading to very simple multicast control plane on the tree nodes,
		 as if they were using static routes. This can be used for both
		 underlay and overlay multicast trees, including replacing
		 BGP-MVPN signaling.
      </t>
    </abstract>

    <note title="Requirements Language">
    <t>
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.
    </t>
    </note>
  </front>

  <middle>
    <section title="Overview">
	  <section title="Terminology">
		<t>Some terminologies are originally introduced in
		<xref target="RFC6514"/>. They are reused in <xref target="I-D.ietf-bess-bgp-multicast"/> and in this document.
		<list>
       <t>PMSI <xref target="RFC6514"/>: Provider Multicast Service Interface - a conceptual interface for a PE
          to send customer multicast traffic to all or some PEs in the same
          VPN.
       </t>
       <t>I-PMSI: Inclusive PMSI - to all PEs in the same VPN.
       </t>
       <t>S-PMSI: Selective PMSI - to some of the PEs in the same VPN.
       </t>
	   <t>I-PMSI A-D Route: Inclusive PMSI Auto-Discovery route used to advertise the tunnels that instantiate an I-PMSI.
	   </t>
    <t>
          S-PMSI A-D route: Selective PMSI Auto-Discovery route used to advertise that particular C-flows are
          bound to (i.e., are traveling through) particular P-tunnels.
    </t>
    <t>
          Leaf A-D route: Leaf Auto-Discovery route used to advertise leaf/receiver information.
    </t>
    <t>
          PMSI Tunnel attribute (PTA): A BGP attribute used
          to identify a particular P-tunnel.
    </t>
		</list>
		</t>
	  </section>
    <section title="Introduction">
    <t><xref target="I-D.ietf-bess-bgp-multicast"/> describes a way to use
    BGP as a replacement signaling for Protocol Independent Multicast (PIM)
	[RFC7761] or Label Distribution Protocol Extensions for P2MP and MP2MP
		  Label Switched Paths (mLDP) [RFC6388].
         The BGP-based multicast signaling described there provides a mechanism
         for setting up both (s,g)/(*,g) multicast trees (as PIM does,
         but optionally with labels) and labeled (MPLS) multicast tunnels
         (as mLDP does).  Each router on a tree performs essentially the
         same procedures as it would perform if using PIM or mLDP, but all
         the inter-router signaling is done using BGP.
    </t>
    <t>
         These procedures allow the routers to set up a separate tree for each
         individual multicast (x,g) flow where the 'x' could be either 's' or
         '*', but they also allow the routers to
         set up trees that are used for more than one flow. In the latter case,
         the trees are often referred to as "multicast tunnels" or
         "multipoint tunnels", and specifically in this document they
         are mLDP tunnels (except that they are set up with BGP signaling).
         While it actually does not have to be restricted to mLDP tunnels,
         mLDP Forwarding Equivalent Class (FEC) [RFC6388] is conveniently
		 borrowed to identify the tunnel. In the rest
         of the document, the term tree and tunnel are used interchangeably.
    </t>
    <t>The trees/tunnels are set up using the "receiver-initiated join"
       technique of PIM/mLDP, hop by hop from downstream routers towards
       the root but using BGP NLRIs of MCAST-TREE SAFI (with the IPv4/IPv6 AFI),
	   either sent hop by hop
	   between downstream routers and their upstream neighbors, or reflected by
       Route Reflectors (RRs).
    </t>
    <t>As an alternative to each hop independently determining its upstream
       router and signaling upstream towards the root (following PIM/mLDP
       model), the entire tree can be calculated
       by a centralized controller, and the signaling can be entirely done
       from the controller using the same MCAST-TREE SAFI. For that, some
       additional procedures and optimizations are specified in this document.
    </t>
    <t><xref target="I-D.ietf-bess-bgp-multicast"/> uses some terminologies
	introduced in BGP-MVPN [RFC6514] because the main procedures and
	concepts are borrowed from there. While the same Leaf A-D routes in
	<xref target="I-D.ietf-bess-bgp-multicast"/> can
	be used to signal replication state to tree nodes from controllers, this
	document introduces a new route type "Replication State" for the same
	functionality so that familiarity with the BGP-MVPN concepts
	is not required.
    </t>
    <t>While it is outside the scope of this document, signaling from the
       controllers could be done via other means as well, like Netconf or
       any other SDN methods.
    </t>
    </section>
    <section title="Resilience" anchor="resilience">
    <t>Each router could establish direct BGP sessions with one or more
       controllers, or it could establish BGP sessions with RRs who in turn
       peer with controllers.
       For the same tree/tunnel, each controller may independently calculate
       the tree/tunnel and signal the routers on the tree/tunnel using MCAST-TREE
       Replication State routes.
       How the calculations are done is outside the scope of this document.
    </t>
    <t>On each router, BGP route selection rules will lead to one controller's
       route for the tree/tunnel being
       selected as the active route and used for setting up forwarding state.
       As long as all the routers on a tree/tunnel consistently pick the same
       controller's routes for the tree/tunnel, the setup should be consistent.
       If the tree/tunnel is labeled, different labels will be used from
       different controllers so there is no traffic loop issue even if the
       routers do not
       consistently select the same controller's routes. In the unlabeled case,
       to ensure consistency the selection SHOULD be solely based on the
       identifier of the controller.
    </t>
    <t>Another consistency issue is when a bidirectional tree/tunnel needs
       to be re-routed.
       Because this is no longer triggered hop-by-hop from downstream to
       upstream, it is possible that the upstream change happens before
       the downstream, causing a traffic loop. In the unlabeled case, there is
       no good solution (other than that the controller issues upstream
       change only after it gets acknowledgement from downstream). In the
       labeled case, as long as a new label is used there should be no problem.
    </t>
    <t>Besides the traffic loop issue, there could be transient traffic loss
       before both the upstream and downstream's forwarding state are updated.
       This could be mitigated if the upstream keeps sending traffic on the old
       path (in addition to the new path) and the downstream keep accepting
       traffic on the old path (but not on the new path) for some time. When
       the downstream switches to the new path is a local matter -
       it could be data driven (e.g., after traffic arrives on the new path)
       or timer driven.
    </t>
    <t>For each tree, multiple disjoint instances could be calculated and
       signaled for live-live protection. Different labels are used
       for different instances, so that the leaves can differentiate incoming
       traffic on different instances. As far as transit routers are concerned,
       the instances are just independent. Note that the two instances
       are not expected to share common transit routers (it is otherwise
       outside the scope of this document/revision).
    </t>
    </section>
    <section title="Signaling" anchor="signaling">
      <t>When a router receives a Replication State route, the re-advertisement
	  is blocked if a configured import RT matches the RT of the route,
	  which indicates that this router is the target and consumer of the route
	  hence it should not be re-advertised further. The routes include the
       forwarding information in the form of Tunnel Encapsulation Attributes
       (TEA) <xref target="RFC9012"/>, with
       enhancements specified in this document.
	</t>
    <t>Suppose that for a particular tree, there are two downstream routers
    D1 and D2 for a particular upstream router U. A controller C sends one
	Replication State route to U, with the Tree Node's IP Address field
	(see <xref target="rep-route"/>) set
       to U's IP address and the TEA specifying both the two downstreams
       and its upstream (see <xref target="rpf-tlv"/>).
       In this case, the Originating Router's Address field of the
	   Replication State route is set to the controller's address<!--,
	   and PMSI Tunnel Attribute is not used-->.
       Note that for a TEA attached to a unicast NLRI, only one of the tunnels
       in a TEA is used for forwarding a particular packet, while all the
       tunnels in a TEA are used to reach multiple endpoints when it is
       attached to a multicast NLRI.
    </t>
	<t>It could be that U may need to replicate to many downstream routers,
	say D1 through D1000. In that case, it may not be possible to encode
	all those branches in a single TEA, or may not be optimal to update
	a large TEA when a branch is added/removed. In that case,
	C may send multiple Replication State routes, each with a different
	RD and a different TEA that encodes
	a subset of the branches. This provides a flexible way to optimize the
	encoding of large number of branches and incremental updates of branches.
	</t>
    <t>Notice that, in the case of labeled trees, the (x,g) or mLDP FEC <!--, or SR-P2MP
       tree identification (<xref target="srp2mp"/>)--> signaling
       is actually not needed to transit routers but only needed to tunnel
       root/leaves. However, for consistency among the root/leaf/transit nodes,
	   and for consistency with the hop-by-hop signaling, the same signaling
	   (with tree identification encoded in the NLRI) is used to all routers.
    </t>
    <t>Nonetheless, a new NLRI route type of the MCAST-TREE SAFI is defined to encode label/SID
	instead of tree identification in the NLRI, for scenarios where
	there is really no need to signal tree identification, e.g. as described in
	<xref target="bgpmvpn"/>. On a tunnel root, the tree's binding SID can
	be encoded in the NLRI.
    </t>
    <t>For a tree node to acknowledge to the controller that it has received the
	signaling and installed corresponding forwarding state, it advertises
	a corresponding Replication State route, with the Originating Router's IP Address
	set to itself and with a Route Target to match the controller.
	For comparison, the tree signaling Replication State route from the controller
	has the Originating Router's IP Address set to the controller and
	the Route Target matching the tree node. The two Replication State routes
	(for controller to signal to a tree node and for a tree node to
	acknowledge back) differ only in those two aspects.
    </t>
	<!--t>Notice that a leaf node may also send a Leaf A-D route to the controller
	to signal that it is a leaf of a tree (<xref target="leaf"/>).
	That leaf-announcing route is different from the above mentioned
	acknowledgement route at least in the "Upstream Router's IP Address field"
	- the former has the controller's address while the latter has this node's
	address in the field. The RDs are likely different as well.
	</t-->
    <t>With the acknowledgement Replication State routes, the controller knows if tree
	setup is complete. The information can be used for many purposes, e.g.,
	the controller may instruct the ingress to start forwarding traffic onto
	a tree only after it knows that the tree setup has completed.
    </t>
    </section>
    <section title="Label Allocation" anchor="labels">
    <t>In the case of labeled multicast signaled hop by hop towards the root,
       whether it's (x,g) multicast or "mLDP" tunnel,
       labels are assigned by a downstream router and advertised
       to its upstream router (from traffic direction point of view). In the
       case of  controller-based signaling, routers do not originate tree
       join routes anymore, so the controllers have to
       assign labels on behalf of routers, and there are three options
       for label assignment:
       <list style="symbols">
          <t>From each router's SRLB that the controller learns
          </t>
          <t>From the common SRGB that the controller learns
          </t>
          <t>From the controller's local label space
          </t>
       </list>
    </t>
    <t>Assignment from each router's SRLB is no different from each router
       assigning labels from its own local label space in the hop-by-hop
       signaling case. The assignments for one router is independent of
       assignments for another router, even for the same tree.
    </t>
    <t>Assignment from the controller's local label space is upstream-assigned
       [RFC5331]. It is used if the controller does not learn the common
       SRGB or each router's SRLB. Assignment from the SRGB [RFC8402]
       is only meaningful if
       all SRGBs are the same and a single common label is used for all the
       routers on a tree in the case of unidirectional tree/tunnel
       (<xref target="single-label"/>).
       Otherwise, assignment from SRLB is preferred.
    </t>
    <t>The choice of which of the options to use depends on many factors.
       An operator may want to use a single common label per tree for ease
       of monitoring and debugging, but that requires explicit RPF checking
       and either common SRGB or upstream assigned labels, which may not be supported
       due to either the software or hardware limitations (e.g. label
       imposition/disposition limits). In an SR network, the assignment from
	   the common SRGB is used if it's required to use a single common
       label per unidirectional tree; otherwise, the assignment from SRLB is a
       good choice because it does not require support for context label
       spaces.
    </t>
    <section title="Using a Common per-tree Label for All Routers"
             anchor="single-label">
    <t>MPLS labels only have local significance. For an LSP that goes through
       a series of routers, each router allocates a label independently and
       it swaps the incoming label (that it advertised to its upstream) to
       an outgoing label (that it received from its downstream) when it
       forwards a labeled packet. Even if the incoming and outgoing labels
       happen to be the same on a particular router, that is just incidental.
    </t>
    <t>With Segment Routing, it is becoming a common practice that all routers
       use the same SRGB so that a SID maps to the same label on all routers.
       This makes it easier for operators to monitor and debug their network.
       The same concept applies to multicast trees as well - a common per-tree
       label can be used for a router to receive traffic from its upstream neighbor
       and replicate traffic to all its downstream neighbor.
    </t>
    <t>However, a common per-tree label can only be used for unidirectional
    trees. In the case of bidirectional trees, the common label needs to be per-
	&lt;tree, direction&gt;. Additionally, unless the entire tree is updated
	for every tree node to use a new common per-tree or per-&lt;tree,
	direction&gt; label with any change in the tree
	(no matter how small and local the change is),
	it requires each router to do explicit RPF check,
       so that only packets from its expected upstream neighbor are accepted.
       Otherwise, traffic loop may form during topology changes,
       because the forwarding state update is no longer ordered.
    </t>
    <t>Traditionally, p2mp mpls forwarding does not require explicit RPF check
       as a downstream
       router advertises a label only to its upstream router and all traffic
       with that incoming label is presumed to be from the upstream router
       and accepted. When a downstream router switches to a different upstream
       router a different label will be advertised, so it can determine if
       traffic is from its expected upstream neighbor purely based on the
       label. Now with a single common label used for all routers on a tree
       to send and receive traffic with, a router can no longer determine
       if the traffic is from its expected neighbor just based on that
       common tree label. Therefore, explicit RPF check is needed.
       Instead of interface-based RPF checking as in PIM case, neighbor-based
	   RPF checking is used - a label identifying the upstream neighbor
       precedes the common tree label and the receiving router checks if that
       preceding neighbor label matches its expected upstream neighbor.
       Notice that this is similar to what's described in 
       Section "9.1.1 Discarding Packets from Wrong PE" of [RFC6513]
       (an egress PE discards traffic sent from a wrong ingress PE).
       The only difference is one is used for label-based forwarding and
       the other is used for (s,g)-based forwarding..
    </t>
    <t>Both the common per-tree label and the neighbor label are allocated
       either from the common SRGB or from the controller's local label space.
       In the latter case, an additional label identifying the controller's
       label space is needed, as described in the following section.
     </t>
    </section>
    <section title="Upstream-assignment from Controller's Local Label Space"
			 anchor="multilabel">
    <t>In this case, in the multicast packet's label stack, the tree label and
        upstream neighbor label (if used in the case of single common-label per
        tree) are preceded by a downstream-assigned "context label".
        The context label identifies a context-specific label space
        (the controller's local label space), and the
        upstream-assigned label that follows it is looked up in that space.
<!--
        For example,
        for a packet arriving on a P2MP tunnel, the outer label indicates
        tunnel root that assigned the inner "upstream-assigned label".
        The receiving router uses the outer label to find the label space in
        which to look up the inner label.
-->

    </t>
    <t> This specification requires that, in the case of upstream-assignment
        from a controller's local label space, each router D to assign,
        corresponding to each controller C, a context label that identifies the
        upstream-assigned label space used by that controller.  This label,
        call it Lc-D, is communicated by D to C via BGP-LS [RFC7752].
    </t>
    <t> Suppose a controller is setting up unidirectional tree T. It assigns
        that tree the label Lt, and assigns label Lu to identify router U which
        is the upstream of router D on tree T. C needs to tell U:
        "to send a packet on the given tree/tunnel, one of the things you
        have to do is push Lt onto the packet's label stack, then push Lu,
        then push Lc-D onto the packet's label stack, then unicast the
        packet to D". Controller C also needs to inform router D of the
        correspondence between &lt;Lc-D, Lu, Lt&gt; and tree T.
    </t>
    <t>To achieve that, when C sends a Replication State route,
    for each tunnel in the TEA, it may include a Tree Label Stack sub-TLV
	(<xref target="treelabel"/>),
	with the outer label being
       the context label Lc-D (received by the controller
       from the corresponding downstream), the next label being the upstream
       neighbor label Lu, and the inner label being the
       label Lt assigned by the controller for the tree. The router receiving
       the route will use the label stacks to send traffic to its downstreams.
    </t>
    <t>For C to signal the expected label stack for D to receive traffic with,
       we overload a tunnel TLV in the TEA of the Replication State route sent to D -
       if the tunnel TLV has a RPF sub-TLV (<xref target="rpf-tlv"/>),
       then it indicates that this is actually for receiving traffic from
       the upstream.
    </t>
   </section>
    </section>
    <section title = "Determining Root/Leaves">
    <t>For the controller to calculate a tree, it needs to determine the
       root and leaves of the tree. This may be based on provisioning
       (static or dynamically programmed), or based on BGP signaling <!--using
       the BGP multicast messages defined in
       <xref target="I-D.ietf-bess-bgp-multicast"/>, -->as described in the
       following two sections.
    </t>
    <t>In both of the following cases, the BGP updates are targeted at the controller, via
       an address specific Route Target with Global Administration Field
       set to the controller's address and the Local Administration Field
       set to 0.<!-- In the case of VPN, an additional Extended Community is attached,
	   which is derived from the Route Target for the VPN
	   (<xref target="I-D.ietf-idr-rt-derived-community"/>).-->
    </t>
    <section title = "PIM-SSM/Bidir or mLDP" anchor="leaf">
    <t>In this case, the PIM Last Hop Routers (LHRs) with interested receivers
    or mLDP tunnel leaves encode a Leaf A-D route
	(<xref target="I-D.ietf-bess-bgp-multicast"/>) with the Upstream
       Router's IP Address field set to the controller's address and the
       Originating Router's IP Address set to the address of the LHR or
       the P2MP tunnel leaf. The encoded PIM SSM source or mLDP FEC provides
       root information and the Originating Router's IP Address provides
       leaf information.
    </t>
    </section>
    <section title = "PIM ASM">
    <t>In this case, the First Hop Routers (FHRs) originate Source Active
       routes which provide root information, and the LHRs originate
       Leaf A-D routes, encoded as in the PIM-SSM case except that
       it is (*,G) instead of (S,G). The Leaf A-D routes provide leaf
       information.
    </t>
    </section>
    </section>
    <section title="Multiple Domains">
    <t>An end to end multicast tree may span multiple routing domains,
       and the setup of the tree in each domain may be done differently as
       specified in <xref target="I-D.ietf-bess-bgp-multicast"/>. This section
       discusses a few aspects specific to controller signaling.
    </t>
    <t>Consider two adjacent domains each with its own controller in the
       following configuration where router B is an upstream node of C
       for a multicast tree:
    <figure>
	<artwork>
                         |
               domain 1  |  domain 2 
                         |
                ctrlr1   |   ctrlr2
                  /\     |     /\
                 /  \    |    /  \
                /    \   |   /    \
               A--...-B--|--C--...-D
                         |
	</artwork>
    </figure>
    </t>
    <t>In the case of native (un-labeled) IP multicast, nothing special is
       needed. Controller 1 signals B to send traffic out of B-C link while
       Controller 2 signals C to accept traffic on the B-C link.
    </t>
    <t>In the case of labeled IP multicast or mLDP tunnel, the controllers may
       be able to  coordinate their actions such that Controller 1 signals B to
       send traffic out of B-C link with label X while Controller 2 signals C
       to accept traffic with the same label X on the B-C link. If the
       coordination is not possible, then C needs to use hop-by-hop BGP
       signaling to signal towards B, as specified in
       <xref target="I-D.ietf-bess-bgp-multicast"/>.
    </t>
    <t>The configuration could also be as follows, where router B borders both
       domain 1 and domain 2 and is controlled by both controllers:
    <figure>
	<artwork>
                       |
              domain 1 | domain 2 
                       |
                ctrlr1 | ctrlr2
                  /\   |   /\
                 /  \  |  /  \
                /    \ | /    \
               /      \|/      \
              A--...---B--...---C
                       |
	</artwork>
    </figure>
    </t>
    <t>As discussed in <xref target="resilience"/>, when B receives signaling
       from both Controller 1 and Controller 2, only one of the routes would
       be selected as the best route and used for programming the forwarding
       state of the corresponding segment. For B to stitch the two segments
       together, it is expected for B to know by provisioning that it is a
       border router so that B will look for the other segment (represented
       by the signaling from the other controller) and stitch the two together.
    </t>
    </section>
    <!--section title="SR-P2MP" anchor="srp2mp">
	<t><xref target="I-D.ietf-pim-sr-p2mp-policy"/> describes an architecture
   to construct a Point-to-Multipoint (P2MP) tree to deliver Multi-point
   services in a Segment Routing domain. An SR-P2MP tree is constructed by
   stitching together a set of  Replication Segments that are specified in
   <xref target="I-D.ietf-spring-sr-replication-segment"/>.
   An SR Point-to-Multipoint (SR-P2MP) Policy is used to define and instantiate
   a P2MP tree which is computed by a controller.
	</t>
	<t>An SR-P2MP tree is no different from an mLDP tunnel in MPLS forwarding
       plane. The difference is in control plane - instead of hop-by-hop mLDP
       signaling from leaves towards the root, to set up SR-P2MP trees
       controllers program forwarding state (referred to as Replication
       Segments) to the root, leaves, and
       intermediate replication points using Netconf, PCEP, BGP or any other
       reasonable signaling/programming methods.
	</t>
	<t>Procedures in this document can be used for controllers to set up
    SR-P2MP trees with just an additional SR-P2MP tree type and
	corresponding tree identification in the Replication State route.
	</t>
    <t>If/once the SR Replication Segment is extended to bi-redirectional,
       and SR MP2MP is introduced, the same procedures in this document would
       apply to SR MP2MP as well.
    </t>
    </section-->
    </section>
    <section title="Alternative to BGP-MVPN" anchor="bgpmvpn">
    <t>Multicast with BGP signaling from controllers can be an alternative to
	BGP-MVPN [RFC6514]. It is an attractive option especially when the
	controller can easily determine the source and leaf information.
    </t>
	<t>With BGP-MVPN, distributed signaling is used for the following:
       <list style="symbols">
         <t>Egress PEs advertise C-multicast (Type-6/7) Auto-Discovery (A-D)
		 routes to join C-multicast trees at the overlay (PE-PE).
         </t>
		 <t>In the case of ASM, ingress PEs advertise Source Active (Type-5) A-D
		 routes to advertise sources so that egress PEs can establish
		 Shortest Path Trees (SPT).
		 </t>
         <t>PEs advertise I/S-PMSI (Type-1/2/3) A-D routes to advertise the binding
		 of overlay/customer traffic to underlay/provider tunnels.
		 For some types of tunnels, Leaf (Type-4) A-D routes are advertised by egress
		 PEs in response to I/S-PMSI A-D routes so that the ingress PE can
		 discover the leaves.
         </t>
       </list>
    </t>
    <t>Based on the above signaled information, an ingress PE builds forwarding
	state to forward traffic arriving on the PE-CE interface to the provider
	tunnel (and local interfaces if there are local downstream receivers),
	and an egress PE builds forwarding state to forward traffic arriving on
	a provider tunnel to local interfaces with downstream receivers.
    </t>
	<t>Instead of for the ingress and egress PEs to build the forwarding state
	as above, since multicast with BGP signaling from controllers as specified
	in this document essentially programs forwarding state onto
	multicast tree nodes, the procedures in this document can be used to set
	up the forwarding state on ingress and egress PEs directly. As long as
	a controller can determine how a C-multicast flow should be forwarded
	on ingress/egress PEs, it can signal to the ingress/egress PEs using
	the procedures in this document to set up forwarding state, removing
	the need of the above-mentioned distributed signaling and processing.
    </t>
    <t>For the controller to learn the egress PEs for a C-multicast tree
	(so that it can set up or find a corresponding provider tunnel),
	the egress PEs advertise MCAST-TREE Leaf A-D routes
	(<xref target="leaf"/>) towards the controller to signal its desire to
	join C-multicast trees, each with an appropriate RD and an extended
	community derived from the Route Target for the VPN
	(<xref target="I-D.ietf-idr-rt-derived-community"/>)
	so that the controller knows which VPN it is for.
	The controller then advertises corresponding MCAST-TREE Replication State
	routes to set up C-multicast forwarding
	state on ingress and egress PEs. To encode the provider tunnel information
	in the MCAST-TREE Replication State route for an ingress PE, the TEA
	can explicitly list all replication branches of the tunnel,
	or just the binding SID for the provider tunnel in the form of
	Segment List tunnel type, if the tunnel has a binding SID.
	</t>
	<t>
	  The Replication State route may also include a PMSI Tunnel Attribute (PTA)
	  attached to specify the provider tunnel, while the TEA specifies the
	  local PE-CE interfaces where traffic needs to be directed. This not only
	  allows a provider tunnel without a binding SID (e.g., in a non-SR network)
	  to be specified without explicitly listing its replication branches,
	  but also allows the service controller for MVPN overlay state to
	  be independent of provider tunnel setup (which could be from a different
	  transport controller or even without a controller).
	</t>
	<t>
	  However, notice that if the service controller and transport controller
	  are different, then the service controller needs to signal the transport
	  controller the tree information: identification, set of leaves, and
	  applicable constraints. While this can be achieved
	  (see <xref target="leaf"/>), it is easier for the service and transport
	  controller to be the same.
    </t>
	<t>The presence of TEA and PTA at the same time means that the
	forwarding state (i.e., the replication branches) are combined from two
	independent sources. The PE receiving a Replication State route with both
	the PTA and TEA finds the tunnel specified in the PTA to merge the
	replication branches of the tunnel with the replication branches specified
	in the TEA.
	</t>
	<t>Depending on local policy, a PE may add PE-CE interfaces to
	its replication state based on local signaling (e.g., IGMP/PIM)
	instead of completely relying on signaling from controllers.
	</t>
	<t>
	If there is a need for dynamic switching between inclusive and selective
	tunnels based on data rate, the ingress PE can advertise or withdraw
	S-PMSI routes targeted only at the controllers, without attaching a PMSI
	Tunnel Attribute. The controller then updates relevant MCAST-TREE
	Replication State routes
	to update C-multicast forwarding states on PEs to switch to a new tunnel.
    </t>
    </section>
    <section title="Specification" anchor="specification">
    <section title="Enhancements to TEA">
	  <t>A TEA can encode a list of tunnels. A TEA attached to an
	  MCAST-TREE NLRI encodes replication information for a &lt;tree, node &gt;
	  that is identified by the NLRI. Each tunnel in the TEA identifies a
	  branch - either an upstream branch towards the tree root
	  (<xref target="rpf-tlv"/>) or a downstream
	  branch towards some leaves. A tunnel in the TEA could have an outer
	  encapsulation (e.g. MPLS label stack) or it could just be a one-hop
	  direct connection for native IP multicast forwarding without any outer
	  encapsulation.
	  </t>
      <t>This document specifies three new Tunnel Types and four new sub-TLVs
   for the use of multicast state signaling via MCAST-TREE SAFI and
   IPv4/IPv6 AFI.
	</t>
	<t>
   Required and optional sub-TLVs, their interactions and
   error handling are described in <xref target="advertising"/> and
   <xref target="receiving"/>. Some key sub-TLVs are described in this section
   as well.
   Other use cases are outside the scope of this document.
	  <!--The type codes will be
       assigned by IANA from the "BGP Tunnel Encapsulation Attribute Tunnel
       Types".-->
    </t>
	<t>The management of the tunnels are outside the scope of this document.
	There are no security concerns specific to any of the tunnels or sub-TLVs,
	besides the general considerations in <xref target="security"/>.
	</t>
    <section title = "Any-Encapsulation Tunnel">
    <t>When a multicast packet needs to be sent from an upstream node to a
       downstream node, it may not matter how it is sent - natively when the
       two nodes are directly connected or tunneled otherwise. In the case of
       tunneling, it may not matter what kind of tunnel is used - MPLS,
       GRE, IPinIP, or whatever.
    </t>
    <t>To support this, an "Any-Encapsulation" tunnel type of value 20 is defined.
    This tunnel MAY have a Tunnel Egress Endpoint <xref target="RFC9012"/>
	and other sub-TLVs.
       The Tunnel Egress Endpoint sub-TLV specifies an
       IP address, which could be any of the following:
       <list style="symbols">
          <t>An interface's local address - when a packet needs to be sent
          out of the corresponding interface natively. On a LAN multicast
		  MAC address MUST be used.
          </t>
          <t>A directly connected neighbor's interface address -
		  when a packet needs to unicast to the address natively.
          </t>
          <t>An address that is not directly connected - when a packet
             needs to be tunneled to the address (any tunnel type/instance
             can be used).
          </t>
       </list>
    </t>
    </section>
    <section title = "Load-balancing Tunnel">
    <t>Consider that a multicast packet needs to be sent to a downstream node,
       which could be reached via four paths P1~P4. If it does not matter
       which of path is taken, an "Any-Encapsulation" tunnel with the
       Tunnel Egress Endpoint sub-TLV specifying the downstream node's loopback
       address works well. If the controller wants to specify that only P1~P2
       should be used, then a "Load-balancing" tunnel needs to be used,
       listing P1 and P2 as member tunnels of the "Load-balancing" tunnel.
    </t>
    <t>A load-balancing tunnel is of type TBD1. It MUST have one "Member Tunnels"
	sub-TLV of type TBD3 defined in this document, which MUST NOT appear
	in other tunnels.
	The sub-TLV's length is two-octets and includes a list of tunnels encoded the same way as in a TEA, each of a tunnel type listed in
	<xref target="advertising"/> to specify
       a way to reach the downstream. A packet will be sent out of one of
       the tunnels listed in the Member Tunnels sub-TLV of the load-balancing
       tunnel.
    <figure>
	<artwork>
               +- - - - - - - - - - - - - - - - +
               | sub-TLV Type (1 Octet, TBD3)   |
               +- - - - - - - - - - - - - - - - +
               | sub-TLV Length (2 Octets)      |
               +- - - - - - - - - - - - - - - - +
               ~       List of Tunnels          ~
               +- - - - - - - - - - - - - - - - +
	</artwork>
    </figure>
    </t>
    </section>
    <section title = "Segment List Tunnel">
	  <t>A Segment List tunnel is of type TBD2. It MUST have a Segment List
	  sub-TLV, whose encoding is specified in Section 2.4.4 of
	  <xref target="I-D.ietf-idr-sr-policy-safi"/><!--. An example
	  use of a Segment List tunnel is provided in <xref target="sr-p2mp"/-->.
	  </t>
	</section>
    <section title="Receiving MPLS Label Stack sub-TLV">
    <t>While <xref target="I-D.ietf-bess-bgp-multicast"/> uses S-PMSI A-D
       routes to signal forwarding information for MP2MP upstream traffic,
       when controller signaling is used, a single Replication State route
	   is used for both upstream and downstream traffic.
	   Since different upstream
       and downstream labels need to be used, a new "Receiving MPLS Label Stack"
       of type TBD5 is added as a tunnel sub-TLV in addition to the existing
       MPLS Label Stack sub-TLV. Other than type difference,
       the two are encoded the same way.
    </t>
    <t>The Receiving MPLS Label Stack sub-TLV is added to each downstream tunnel
       in the TEA of Replication State route for an MP2MP tunnel to specify the forwarding
       information for upstream traffic from the corresponding downstream node.
       A label stack instead of a single label is used because of
       the need for neighbor-based RPF check, as further explained in the
       following section.
    </t>
    <t>The Receiving MPLS Label Stack sub-TLV is also used for downstream traffic
       from the upstream for both P2MP and MP2MP, as specified below.
    </t>
    </section>
    <section title="RPF sub-TLV" anchor="rpf-tlv">
    <t>The RPF sub-TLV is of type 124 and has a
       one-octet length. The length is 0 currently, but if necessary
       in the future, sub-sub-TLVs could be placed in its value part.
       If the RPF sub-TLV appears in a tunnel, it indicates that the "tunnel"
       is for the upstream node instead of a downstream node.
	</t>
	<t>
	   In the case of MPLS, the tunnel contains
       a Receiving MPLS Label Stack sub-TLV for downstream traffic from the
       upstream node, and in the case of MP2MP it also contains a regular
       MPLS Label Stack sub-TLV for upstream traffic to the upstream node.
    </t>
    <t>The innermost label in the Receiving MPLS Label Stack is the incoming
       label identifying the tree (for comparison the innermost label for a
       regular MPLS Label Stack is the outgoing label). If the Receiving MPLS
       Label Stack sub-TLV has more than one labels,
       the second innermost label in the stack identifies the expected
       upstream neighbor and explicit RPF checking needs to be set up for
       the tree label accordingly.
    </t>
    </section>
    <section title="Tree Label Stack sub-TLV" anchor="treelabel">
	  <t>The MPLS Label Stack sub-TLV can be utilized to specify the
	  complete label stack used to transmit traffic,
	  with the stack including both a transport label (stack) and
	  label(s) that identify the (tree, neighbor) to the downstream node.
	  There are cases where the controller only wants to specify
	  the tree-identifying labels but leave the transport details
	  to the router itself. For example, the router could locally
	  determine a transport label (stack) and combine with the tree-identifying
	  labels signaled from the controller to get the complete outgoing
	  label stack.
	  </t>
	  <t>For that purpose, a new Tree Label Stack sub-TLV of type 125 is defined,
	  with a one-octet length field. It MAY appear in an Any-Encapsulation
	  tunnel. The value field contains a label stack
	  with the same encoding as the value field of the MPLS Label Stack sub-TLV,
	  but with a different type.
    </t>
	<t>A stack is used here because
	  it may take up to three labels (see <xref target="labels"/>):
       <list style="symbols">
         <t>If different nodes use different labels (allocated from the common
		 SRGB or the node's SRLB) for a (tree, neighbor) tuple, only a single
		 label is in the stack. This is similar to the current mLDP hop by hop
		 signaling case.
          </t>
          <t>If different nodes use the same tree label, then an additional
		  neighbor-identifying label is needed in front of the tree label.
          </t>
          <t>For the previous bullet, if the neighbor-identifying label is
		  allocated from the controller's local label space, then an additional
		  context label is needed in front of the neighbor label.
          </t>
       </list>
	  </t>
    </section>
    <section title="Backup Paths sub-TLV" anchor="backup">
	  <t>The Backup Paths sub-TLV is used to specify the backup paths for
	  a tunnel. The type is TBD4.
	  The length is two-octet. The value field encodes a one-octet Flags
	  field and a list of backup tunnels encoded the same way as in a TEA.
	  If the tunnel goes down, traffic that is normally sent out of the tunnel
	  is fast rerouted to all the tunnels listed in the Backup Paths sub-TLV.
    <figure>
	<artwork>
               +- - - - - - - - - - - - - - - - +
               | sub-TLV Type (1 Octet, TBD4)   |
               +- - - - - - - - - - - - - - - - +
               | sub-TLV Length (2 Octets)      |
               +- - - - - - - - - - - - - - - - +
               | P | rest of 1 Octet Flags      |
               +- - - - - - - - - - - - - - - - +
               |Backup Tunnels (variable length)|
               +- - - - - - - - - - - - - - - - +
	</artwork>
    </figure>
	  </t>
	  <t>The backup tunnels can lead to the same or different nodes reached
	  by the original tunnel.
	  </t>
	  <t>An incoming tunnel MAY also have a Backup Paths sub-TLV.
	  If the Parallel (P-)bit in the flags field is set,
	  both traffic arriving on the original tunnel and on the tunnels encoded
	  in the Backup Paths sub-TLV's TEA can be accepted. Otherwise, traffic
	  arriving on the backup tunnels is accepted only if the router has switched
	  to receiving on the backup tunnel (this is the equivalent of PIM/mLDP
	  MoFRR <xref target="RFC7431"/>).
	  </t>
    </section>
    </section>
    <section title="Context Label TLV in BGP-LS Node Attribute" anchor="contexttlv">
    <t>For a router to signal the context label that it assigns for a
    controller (or any label allocator that assigns labels - from its
	local label space - that will be received by this router),
	a new BGP-LS Node Attribute TLV of type TBD6 is defined:
    <figure>
	<artwork>
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type TBD6          |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Context Label                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	   
   |            IPv4/v6 Address of Label Space Owner               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	   
	</artwork>
    </figure>
    </t>
	<t>The Context Label field is 3-octet, where the high-order 20 bits
	contain the label value.
	The Address of Label Space Owner is IPv4 if the Length is 7,
	or IPv6 if the Length is 19. Multiple
	Context Label TLVs may be included in a Node Attribute,
	one for each label space owner.
	</t>
	<t>As an example, a controller with address 198.51.100.1 allocates
	label 200 from its own label space, and router A assigns label 100 to
	identify this controller's label space. The router includes
	the Context Label TLV (100, 198.51.100.1) in its BGP-LS Node Attribute
	and the controller instructs router B to send traffic to router A
	with a label stack (100, 200), and router A uses label 100 to determine
	the Label FIB in which to look up label 200.
	</t>
    </section>
    <section title="MCAST Extended Community">
	  <t>A tree node needs to acknowledge to the controller the success/failure
	  in installing forwarding state for a tree. In case of failure, an
	  MCAST NACK Extended Community is attached. The value field is set to 0.
	  In the future, flag bits may be defined to signal specific failure.
	  </t>
	  <t>The MCAST NACK Extended Community is an MCAST Extended Community with
	  a sub-type TBD9 to be assigned by IANA. The MCAST Extended Community is a new
	  Extended Community with a type TBD8 to be assigned by IANA.
    <figure>
	<artwork>
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Type=TBD8     | Sub-Type=TBD9 |          Reserved=0           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 Reserved=0                                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	</artwork>
    </figure>
	  </t>
    </section>
    <!--section title="Local Label Block Community">
	<t>For a router to signal its local label block from which a controller can
    allocate labels from on its behalf, it attaches a Local Label Block
	Wide Community
       <xref target="I-D.ietf-idr-wide-bgp-communities"/> to the host route
       for its own address used in its BGP session towards the controllers
       (directly or via RRs). This is a new wide community that specifies
       the (Label Allocator, Block Number, Block Size, Starting Label) tuple.
	   the exact format will be specified in a future revision.
	</t>
    </section-->
    <section title="Replication State Route Type" anchor="rep-route">
	<t>The NLRI route type (TBD7) for signaling from controllers to tree nodes
       is "Replication State". The NLRI has the following format:
    <figure>
	<artwork>
                +-----------------------------------+
                |Route Type - Replication State TBD7|
                +-----------------------------------+
                |     Length (1 octet)              |
                +-----------------------------------+
                |     Tree Type (1 octet)           |
                +-----------------------------------+
                |Tree Type Specific Length (1 octet)|
                +-----------------------------------+
                |     RD (8 octets)                 |
                +-----------------------------------+
                ~  Tree Identification (variable)   ~
                +-----------------------------------+
                |    Tree Node's IP Address         |
                +-----------------------------------+
                |  Originating Router's IP Address  |
                +-----------------------------------+

                      Replication State NLRI
	</artwork>
    </figure>
	</t>
	<t>Notice that Replication State is just a new route type with the same
	   format of Leaf A-D route except some fields are renamed:
       <list style="symbols">
         <t>Tree Type in Replication State route matches the PMSI route type
		 in the Leaf A-D route <xref target="I-D.ietf-bess-bgp-multicast"/>.
		 In this document, it could be one of the values as follows.
       <list style="symbols">
			<t>2: P2MP Tree with Label as Identification</t>
			<t>3: IP Multicast</t>
			<t>0x43: mLDP</t>
	   </list>
          </t>
          <t>Tree Node's IP Address matches the Upstream Router's IP Address
		  of the PMSI route key in Leaf A-D route
          </t>
       </list>
	</t>
	<t>With this arrangement, IP multicast tree and mLDP tunnel can be
	signaled via Replication State routes from controllers, or via Leaf A-D
	routes either hop by hop or from controllers with maximum code reuse,
	while newer types of trees <!--like SR-P2MP--> can be signaled via Replication
	State routes with maximum code reuse as well.
	</t>
	<t>The Tree Node's and the Originating Router's IP address is IPv4 (4-octet)
	if the AFI is 1 or IPv6 (16-octet) if the AFI is 2.
	</t>
	<t>The Tree Identification field varies based on the Tree Type.
	For Tree Type 2, it is a label stack (<xref target="labeltree"/>).
	For Tree Type 0x43, it is an mLDP FEC (<xref target="RFC6388"/>).
	For Tree Type 3, it is as follows:
	<artwork>
         +-----------------------------------+
         | Multicast Source Length (1 octet) |
         +-----------------------------------+
         |  Multicast Source (variable)      |
         +-----------------------------------+
         |  Multicast Group Length (1 octet) |
         +-----------------------------------+
         |  Multicast Group   (variable)     |
         +-----------------------------------+
         |  Upstream Router's IP Address     |
         +-----------------------------------+
	</artwork>
	</t>
	<t>
   If the Multicast Source (or Group) field contains an IPv4 address,
   then the value of the Multicast Source (or Group) Length field is 32.
   If the Multicast Source (or Group) field contains an IPv6 address,
   then the value of the Multicast Source (or Group) Length field is
   128.
	</t>
    </section>
    <!--section title="SR-P2MP Signaling">
	<t>An SR-P2MP policy for an SR-P2MP tree is identified by a (Root, Tree-id)
       tuple. It has a set of leaves and set of Candidate Paths (CPs).
       The policy is instantiated on the root of the tree, with corresponding
       Replication Segments - identified by (Root, Tree-id, Tree-Node-id) -
       instantiated on the tree nodes (root, leaves, and
       intermediate replication points).
	</t>
    <section title="Replication State Route for SR-P2MP">
    <t>For SR-P2MP, forwarding on tree nodes state are represented as
       Replication Segments and are signaled from controllers to tree nodes
	   via Replication State routes. A Replication State route for SR-P2MP
       has a Tree Type 1 and the Tree Identification includes (Route
	   Distinguisher, Root ID, Tree ID), where the RD implicitly identifies
	   the candidate path.
    <figure>
	<artwork>
                +- - - - - - - - - - - - - - - - - -+
                |   Route Type - Replication State  |
                +- - - - - - - - - - - - - - - - - -+
                |     Length (1 octet)              |
                +- - - - - - - - - - - - - - - - - -+
                |    Tree Type (1 - SR-P2MP)        |
                +- - - - - - - - - - - - - - - - - -+
                |Tree Type Specific Length (1 octet)|
                +- - - - - - - - - - - - - - - - - -+
                |      RD   (8 octets)              |
                +- - - - - - - - - - - - - - - - - -+
                |  Root ID (4 or 16 octets)         |
                +- - - - - - - - - - - - - - - - - -+
                |       Tree ID (4 octets)          |
                +- - - - - - - - - - - - - - - - - -+
                |     Tree Node's IP Address        |
                +- - - - - - - - - - - - - - - - - -+
                |  Originating Router's IP Address  |
                +- - - - - - - - - - - - - - - - - -+

           Replication State route for SR Replication Segment
	</artwork>
    </figure>
	</t>
    </section>
    <section title="BGP Community Container for SR-P2MP Policy">
	<t>The Replication State route for Replication Segments signaled to the root is
       also used to signal (parts of) the SR-P2MP Policy - the policy name,
       the set of
       leaves (optional, for informational purpose), preference of the CP
       and other information are all encoded in a newly defined BGP Community
       Container (BCC) <xref target="I-D.ietf-idr-wide-bgp-communities"/>
       called SR-P2MP Policy BCC.
	</t>
	<t>The SR-P2MP Policy BCC has a BGP Community Container type to be
       assigned by IANA. It is composed
       of a fixed 4-octet Candidate Path Preference value, optionally followed
       by TLVs.
    <figure>
	<artwork>

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                Candidate Path Preference                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     |                        TLVs (optional)                        |
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


             BGP Community Container for SR-P2MP Policy
	</artwork>
    </figure>
	</t>
	<t>One optional TLV is to enclose the following optional Atoms TLVs
       that are already defined in
       <xref target="I-D.ietf-idr-wide-bgp-communities"/>:
       <list style="symbols">
          <t>An IPv4 or IPv6 Prefix list - for the set of leaves
          </t>
          <t>A UTF-8 string - for the policy name
          </t>
       </list>
	</t>
	<t>If more information for the policy are needed, more Atoms TLVs or
       SR-P2MP Policy BCC specific TLVs can be defined.
	</t>
	<t>The root receives one Replication State route for each Candidate Path of the
       policy. Only one of the routes need to, though more than one MAY
       include the above listed optional Atom TLVs in the SR-P2MP Policy BCC.
	</t>
	<t>Alternatively, an additional route type can be used to carry policy
	information instead. Details/decision to be specified in a future revision.
	</t>
    </section>
    <section title="Tunnel Encapsulation Attribute" anchor="sr-p2mp">
	  <t>The TEA attached to a Replication State route for SR-P2MP encodes
	  tunnels as specified in earlier sections.
	  A tunnel could be an Any-Encapsulation tunnel with MPLS Label Stack
	  sub-TLV or Receiving MPLS Label Stack sub-TLV (in the case of SR-MPLS),
	  a Segment List tunnel, or a Load-balancing tunnel.
	  </t>
	  <t>For a Segment List tunnel in this context, the last segment in the
	  segment list represents the SID of the tree. When it is without the RPF
	  sub-TLV,
	  the previous segments in the list steer traffic to the downstream
	  node, and the segment before the last one MAY also be a binding SID
	  for another P2MP tunnel, meaning that the replication branch
	  represented by this "Segment List" is actually a P2MP tunnel to a
	  set of downstream nodes.
	  </t>
	  <!t-tobedeleted>Compared with another tunnel type that may need to use a Tree Label
	  Stack sub-TLV (<xref target="treelabel"/>) instead of a Label Stack
	  sub-TLV), the Segment List tunnel type only need to use the Segment List
	  sub-TLV alone because the first segment in the list is to be resolved
	  first (while the Label Stack sub-TLV is for outgoing labels without
	  resolution).
	  </t>
    </section>
    </section-->
    <!--section title="SR Policy Tunnel Type" anchor="srtunnel">
	<t>The Tunnel Encapsulation Attribute (TEA) attached to Replication State routes
	   encodes all replication branch information. For example, if an SR 
       explicit path is to be used to reach a particular downstream node,
       the TEA may include
       a tunnel that lists the entire label stack for that SR path, plus
       the label that identifies the tree to the downstream node.
	</t>
	<t>That SR path may have already been installed on the node as a unicast SR policy
       with a corresponding Binding SID. In stead of listing the entire label
       stack in an MPLS tunnel in the TEA, a different tunnel, SR Policy Tunnel
       <xref target="I-D.ietf-idr-sr-policy-safi"/>, can be used as
       an alternative. The tunnel MUST include a Binding SID sub-TLV and MAY
	   include a Tunnel Egress Endpoint sub-TLV that identifies the downstream node,
	   and segment list or a Tree Label Stack sub-TLV that identifies to the
	   downstream node the
       tree. When a node receives the Replication State route with the TEA that
       contains an SR Policy Tunnel without a RPF sub-TLV, the Binding SID is
       used to locate corresponding outgoing segment lists used to reach the
       downstream node; the tree-identifying segment list or the Tree Label
	   Stack is added to outgoing segment lists mapped
       from the binding SID to form the entire segment list used to send traffic
       to downstream node. 
	</t>
	<t>Note that, the SR Policy Tunnel is initially defined to instantiate
       an SR policy. For that use case it provides information associated
       with the policy, e.g., Binding SID, preference, and segment lists.
       The receiving node installs that policy and establishes the mapping
       from the Binding SID to the outgoing segments. The use of SR Policy
       Tunnel in this document is to refer to a pre-installed SR policy
       so the preference and segment lists are not used.
	</t>
	<t>If a tunnel in the TEA carries a RPF sub-TLV, it is for the upstream
	   node. The tunnel may be an MPLS tunnel in the case of
       SR MPLS, and the Receiving MPLS Label Stack sub-TLV specifies the
	   incoming label stack that identifies the tree and optionally
	   the upstream neighbor.
       Alternatively, for both SR-MPLS and SRv6 an SR Policy Tunnel with the
       RPF sub-TLV can be used, in which the Binding SID sub-TLV is the SID
       for the tree.
	</t>
	<t>If the node is the root and a Binding SID is allocated by the
       controller, the Binding SID is signaled to the root in a TEA tunnel
       with a RPF sub-TLV as above but  without a destination sub-TLV.
	</t>
	<t>When an SR Policy Tunnel does not have a RPF-sub-TLV, it may represent
	a P2MP policy, meaning that a downstream branch is actually another
	P2MP tunnel. It may be used for the following use cases:
    <list style="symbols">
	  <t>The Replication State route to which the TEA is attached to is for an
	  outer P2MP tunnel stacked over an inner P2MP tunnel represented
	  by the SR Policy Tunnel in the TEA, similar to the "mLDP over P2MP"
	  scenario described in [RFC7060].
	  </t>
	  <t>The Replication State route to which the TEA is attached to is for a
	  multicast route in an ingress VRF of an MVPN (<xref target="bgpmvpn"/>)
	  and the provider tunnel in use is represented by the SR Policy Tunnel in the TEA.
	  </t>
	</list>
	</t>
    </section-->
    <section title="Replication State Route with Label Stack for Tree Identification" anchor="labeltree">
      <t>As described in <xref target="signaling"/>, the tree label, instead of
	  tree identification could be encoded in the NLRI to identify the tree in
	  the control plane as well as in the forwarding plane. For that a new
	  Tree Type of 2 is used and the Replication State route has the
	  following format:
    <figure>
	<artwork>
                +-------------------------------------+
                |    Route Type - Replication State   |
                +-------------------------------------+
                |     Length (1 octet)                |
                +-------------------------------------+
                |    Tree Type 2 (Label as Tree ID)   |
                +-------------------------------------+
                |Tree Type specific Length (1 octet)  |
                +-------------------------------------+
                |      RD   (8 octets)                |
                +-------------------------------------+
                ~      Label Stack (variable)         ~
                +-------------------------------------+
                |  Tree Node's IP Address             |
                +-------------------------------------+
                |  Originating Router's IP Address    |
                +-------------------------------------+

         Replication State route for tree identification by label stack
	</artwork>
    </figure>
    </t>
	<t>The Tree Node's and the Originating Router's IP address is IPv4 (4-octet)
	if the AFI is 1 or IPv6 (16-octet) if the AFI is 2.
	</t>
    <t>As discussed in <xref target="multilabel"/>, a label stack may have
	to be used to identify a tree in the data plane so a label stack is
	encoded here. The number of labels is derived from the Tree Type Specific
	Length field. Each label stack entry is encoded as follows:
    <figure>
	<artwork>
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                Label                  |0 0 0 0 0 0 0 0 0 0 0 0|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	</artwork>
    </figure>
	</t>
    </section>
    </section>
    <section title="Procedures">
      <t>This section applies to MPLS data plane. While the concept of
	  BGP signaling applies to SRv6 data plane as well, SRv6 related
	  specification is outside the scope of this document.
      Note that, this document does
	  not assume Segment Routing is used, even though the SRGB/SRLB
	  terminologies are used to describe label blocks, and some scenarios
	  of Segment routing are considered.
    </t>
    <section title="Label Space and Tree Label Allocation">
      <t>In the case of labeled trees for either (x, g) IP multicast or mLDP
	  tunnels, an operator first determines which of the following methods
	  is used to allocate tree-identifying labels, as explained in
	  <xref target="labels"/>:
    <list style="numbers">
	  <t anchor="private">A common per-tree label on all nodes of a P2MP
	  tree, or a common
	  per-&lt;tree, direction&gt; label on all nodes of a MP2MP tree,
	  allocated from the controller's own label space.
	  </t>
	  <t anchor="srgb">A common per-tree label on all nodes of a P2MP tree,
	  or a common per-&lt;tree, direction&gt; label on all nodes of a MP2MP
	  tree, allocated from a common SRGB.
	  </t>
	  <t anchor="srlb">Uncorrelated labels independently allocated from each
	  node's SRLB.
	  </t>
	</list>
	  </t>
	  <t>For option <xref target="srgb" format="counter"/> and
	  <xref target="srlb" format="counter"/>, the process through which
	  the controller learns the common SRGB or each node's SRLB is
	  outside the scope of this document.
	  </t>
	  <t>For option <xref target="private" format="counter"/>,
	  each tree node MUST advertise a label from its default label space to
	  identify the controller's label space, via the Context Label TLV in
	  BGP-LS Node Attribute (<xref target="contexttlv"/>).
	  The tree-identifying label in TEA and packets MUST be preceded by the
	  label-space-identifying label.
      </t>
	  <t>For option <xref target="private" format="counter"/> and
	  <xref target="srgb" format="counter"/>,
	  the operator also determines if the controller allocates a new label
	  for each tree or &lt;tree, direction&gt; and resignal to all tree nodes
	  even when only some tree nodes need to be changed. If not,
	  then another neighbor-identifying label needs to precede the
	  tree-identifying label (and follow the label-space-identifying label
	  in the case of option <xref target="private" format="counter"/>).
	  The neighbor-identifying
	  label MUST be allocated from the same label space or SRGB from which
	  the tree-identifying label is allocated.
	  </t>
	  <t>To generalize, a label stack can contain one label
	  (for option <xref target="srlb" format="counter"/>),
	  two labels (for option <xref target="srgb" format="counter"/> and
	  <xref target="private" format="counter"/>
	  if neighbor-identifying label is not needed), or three labels
	  (for option <xref target="srgb" format="counter"/> and <xref target="private" format="counter"/>
	  if neighbor-identifying label is needed). In the rest of the document,
	  tree-identifying label(-stack) term is used generically.
	  </t>
    </section>
    <section title="Advertising Replication State Routes" anchor="advertising">
      <t>After the controller calculates a tree, it constructs one or more
	  Replication State Routes for each tree node as follows:
    <list style="symbols">
	  <t>If the tree is for the default routing instance and only one route
	  is needed, the RD MAY be set to 0:0. Otherwise, the RD is set to a value
	  to distinguish the routes for trees in different routing instances
	  but with the same tree identifier (e.g., (x, g) or mLDP FEC for a VPN),
	  or to distinguish the multiple routes needed for the same &lt;tree,
	  node&gt;.
	  </t>
	  <t>The Route Type, Length, Tree Type, Tree Type Specific Length,
	  and Tree Identification are set accordingly.
	  </t>
	  <t>The Tree Node's IP Address is set to an address of the tree node,
	  typically the loopback address.
	  </t>
	  <t>The Originator's IP Address is set to the controller's address.
	  </t>
	  <t>An IP Address Specific Route Target is attached, with the Global
	  Administration Field set to match the Tree Node's IP Address in
	  the NLRI, and the Local Admin Field set to 0.
	  </t>
	  <t>In the case of VPN, an Extended Community derived from the Route Target
	  for the VPN (<xref target="I-D.ietf-idr-rt-derived-community"/>) is
	  attached.
	  </t>
	  <t>Either a TEA or a PTA or both MUST be attached to encode
	  the replication information, as detailed below.
	  </t>
	</list>
	  </t>
	  <t>The TEA encompasses one or more tunnels. If the route is for the root
	  node of an MPLS tree with a binding SID, a single tunnel of type
	  Any-Encapsulation MAY be included with a RPF sub-TLV and a Receiving MPLS
	  Label Stack sub-TLV to encode the binding SID. If the route is for a
	  leaf or bud node (which is both a leaf node and transit node
	  simultaneously),
	  a single tunnel of type Any-Encapsulation MUST be included
	  with a Tunnel Egress Endpoint sub-TLV. The address of this sub-TLV
	  MUST be set to a loopback address on the node.
	  </t>
	  <t>Additionally, for any node (root, transit, bud, or leaf), a tunnel
	  is included for each of its upstream/downstream node. Each tunnel MUST
	  include a Tunnel Egress
	  Endpoint sub-TLV if it is needed to derive forwarding information.
	  Otherwise, the sub-TLV MAY be included for informational purposes.
    <list style="symbols">
	  <t>For any node from which traffic may be received with
	  tree-identifying label(s) (notice that on a bidirectional tree traffic
	  may be received on
	  multiple branches), the tunnel MUST include a Receiving MPLS Label Stack
	  sub-TLV to encode the incoming tree-identifying label(-stack).
	  </t>
	  <t>For any node from which unlabeled IP multicast traffic may be
	  received for the IP multicast tree (notice that on a bidirectional tree
	  traffic may be received on multiple branches), the tunnel MUST be of
	  type Any-Encapsulation with a Tunnel Egress
	  Endpoint sub-TLV with the address set to the local address of the incoming
	  interface.
	  </t>
	  <t>If traffic is to be sent with a tree-identifying label(-stack),
	  it MUST include a Tree Label Stack sub-TLV, with an exception for
	  the Segment List tunnel (see below).
	  </t>
	  <t>If a tunnel is to be protected by some backup paths calculated
	  by the controller, a Backup Paths sub-TLV MUST be included to encode
	  the backup paths. Each backup path is encoded as a tunnel following
	  the same way as for the normal paths.
	  The P-bit in the Backup Paths sub-TLV MUST be set to 1 if the tunnel
	  is used for incoming traffic and traffic from both the primary and backup
	  paths is to be forwarded, and set to 0 otherwise.
	  </t>
	  <t>A tunnel for an upstream/downstream node MAY be one of the following
	  types:
    <list style="symbols">
	  <t>Any-Encapsulation: any encapsulation can be used to send or receive
	  traffic.
	  it MUST include a Tunnel Egress Endpoint sub-TLV, with the address set
	  to either a local address of incoming/outgoing interface, or the address
	  of a neighbor to/from which outgoing traffic is sent/received.
	  </t>
	  <t>MPLS, MPLS in GRE, MPLS in UDP: like the Any-Encapsulation case,
	  except that specifically MPLS (native or in GRE/UDP) tunneling is used.
	  </t>
	  <t>Segment List: This is for sending traffic via an explicit SR path
	  represented by the segment list encoded in the tunnel.
      The first segment of the list MUST be a Prefix/Adjacent/Binding SID that
	  enables the node to send replicated packet towards the
	  downstream node. The Tunnel Egress Endpoint sub-TLV is optional, because
	  the segment list itself provides all the forwarding information.
	  If tree-identifying label(-stack) is needed, as an alternative to include
	  a Tree Label Stack sub-TLV, the label(-stack) MAY be encoded as the last
	  one or two or three segments (the use of one or two or three labels is
	  explained in <xref target="labels"/>).
	  </t>
	  <t>Load-balancing: This is for a downstream node to be reached
	  via one of the member tunnels listed in the Load-balancing tunnel,
	  each of a type specified above. As it is for a downstream node,
	  the RPF sub-TLV MUST NOT be included in the Load-balancing tunnel
	  itself or any of the member tunnels. The Tree Label Stack sub-TLV or
	  Receiving MPLS Label Stack sub-TLV, if needed, MUST be in the
	  Load-balancing tunnel itself and NOT in any member tunnel.
	  The Load-balancing tunnel itself MUST NOT include the Tunnel Egress
	  Endpoint sub-TLV, as the information should be carried in each
	  member tunnel if it is needed.
	  </t>
	</list>
	  </t>
	</list>
	  </t>
	  <t>Other tunnel types and sub-TLVs may also be used but that's outside
	  the scope of this document.
	  <!--This specification is intentionally not stringent to favor flexibility
	  - as long as the route
	  can be syntactically parsed, no BGP protocol error will be reported.
	  However, missing/conflicting/inappropriate sub-TLVs may cause multicast
	  forwarding state not to be created, and that is a problem for multicast
	  service only. In such case, appropriate (dampened) logs MUST be generated
	  on the targeted node, and an negative acknowledgement MUST be sent back
	  to the controller.-->
      </t>
	  <t>The use of PTA is to provide some replication branch information
	  by encoding a P2MP tunnel identifier in the PTA instead of explicitly
	  listing the replication branches of that P2MP tunnel in the TEA,
	  as explained in <xref target="bgpmvpn"/>.
      </t>
    </section>
    <section title="Receiving Replication State Routes" anchor="receiving">
	  <t>Each potential tree node MUST be (auto-)configured with an
	  IP Address Specific Route Target to import Replication State Routes
	  targeted at itself. The Global Administration Field MUST be
	  set to a local address known to be used by the controller to identify
	  the node, and the Local Administration Field MUST set to 0.
	  </t>
      <t>When a BGP speaker receives a Replication State Route and the attached
	  Route Target matches its (auto-)configured Route Target to import the
	  route, it MUST stop re-advertising the route further. Otherwise,
	  normal BGP route propagation rules apply.
	  </t>
	  <t>If an imported Replication State Route carries an Extended
	  Community derived from
	  a Route Target for a local VRF, the route is imported into that VRF.
	  Otherwise, it is imported into the default routing instance.
	  </t>
	  <t>For the same &lt;tree, node&gt;, there may be multiple routes
	  imported, from the same or different controllers. BGP best route
	  selection process selects one of them as the active path, but all routes
	  with the same NLRI as the active path without considering the RD field
	  MUST be considered together to create forwarding state on the node
	  for the tree. Recall that multiple such routes may be advertised when
	  it is desired to signal a large set of replication branches via multiple
	  routes.
	  </t>
	  <t>The forwarding state includes two parts - the "nexthop" part that
	  has the replication branches and the "route" part (with the key being
	  a label or (x,g) tuple), just like how a unicast IP route points to a
	  nexthop in a RIB/FIB.
	  </t>
	  <!--t>The error checking is intentionally lenient: as long as the route
	  can be syntactically parsed, no BGP protocol error will be reported.
	  However, missing/conflicting/inappropriate sub-TLVs may cause multicast
	  forwarding state not to be created, and that is a problem for multicast
	  service only. In such case, appropriate (dampened) logs MUST be generated,
	  and an negative acknowledgement MUST be sent back to the controller
	  (<xref target="ack"/>).
	  </t-->
	  <t>Either or both the TEA and PTA may be present in the Replication State
	  route. They are parsed with the following validation checking:
		<list style="symbols">
		  <t>Syntactic checking based on the length field of TLVs and sub-TLVs of the TEA:
		  <list>
			<t>The length field of a tunnel TLV in the TEA MUST equal to the
			total length of sub-TLVs (including the type and length
			fields) plus the applicable non-sub-TLV fields (e.g., the Flags
			field of a Backup Paths sub-TLV).</t>
			<t>The total length of tunnel TLVs (including the type and length
			fields) MUST equal the length value of the TEA.</t>
		  </list>
		  </t>
		  <t>Syntactic and semantic checking of the PTA according to <xref target="RFC6514"/>.
		  </t>
		  <t>Semantic checking as follows.</t>
		</list>
	  </t>
	<t>If the syntactic check fails, the route MUST be treated as withdrawn.
    </t>
    <section title="Compiling Replication Branches">
	  <t>
	  The receiving node goes through the tunnels in the TEAs in all the
	  relevant routes as described above, and build the nexthop as a collection
	  of replication branches. If a PTA is present, the replication branches
	  associated with the P2MP tunnel in the PTA are also added to the
	  collection. If the TEA has semantic error for a tunnel (i.e., some
	  expected information specified below is missing), the tunnel
	  MUST be ignored (i.e., the corresponding replication branch is not added
	  to the collection) but other tunnels MUST still be used if they are
	  semantically correct.
    </t>
	<t>For the TEA:
    <list style="symbols">
	  <t>If a tunnel has a RPF sub-TLV and the tree is unidirectional,
	  it is skipped.
	  </t>
	  <t>If a tunnel is of type Segment List, the replication branch is
	  constructed from the Segment List sub-TLV and optionally a Tree Label
	  Stack sub-TLV if that is included. Even for an (x,g) IP multicast
	  tree, the segment list may be used to identify  both the tunnel to reach
	  the node and/or tree-identifying label(-stack). For example,
	  an incoming IP multicast packet can be replicated out of some branches
	  as native IP packets and some other branches with label stacks.
	  Those label stacks may just forward/tunnel the packets to the
	  downstream/upstream node, or may include tree-identifying label(-stack)
	  to allow the receiving node to forward based on incoming label(-stack)
	  instead of (x,g) prefix.
	  </t>
	  <t>If a tunnel is of type Any-Encapsulation, it must have a Tunnel
	  Egress Endpoint sub-TLV.
    <list style="symbols">
	  <t>If the egress endpoint address is a local interface address,
	  <!--the tunnel SHOULD NOT have a Tree Label Stack sub-TLV (or the sub-TLV
	  would not be ignored anyway)-->
	  the interface is the replication branch. The interface could be a
	  loopback, indicating that traffic needs to be delivered locally off
	  the tree, e.g.:
    <list style="symbols">
	  <t>To an application running on the node, or,
	  </t>
	  <t>To be further routed in a VRF, e.g., when this tree is a provider
	  tunnel for MVPN.
	  </t>
	</list>
	  </t>
	  <t>Otherwise, the forwarding state for the replication branch is
	  constructed as "pushing tree-identifying labels in the Tree Label Stack
	  sub-TLV if it is present, and then pushing any encapsulation that can be
      used to reach the node as encoded in the Tunnel Egress Endpoint sub-TLV".
	  </t>
	</list>
	  </t>
	  <t>If a tunnel is of type MPLS, MPLS in GRE or MPLS in UDP, it is
	  similar to the Any-Encapsulation case. The difference is that
	  MPLS, MPLS in GRE or MPLS in UDP MUST be used to reach the downstream
	  node.
	  </t>
	  <t>If a tunnel is of type Load-Balancing, then each of the member tunnels
	  in the Load-Balancing tunnel is examined to construct the branch that
	  comprises the set of Load-Balancing members, so that a replicated copy
	  will be sent out of one of Load-Balancing members.
	  </t>
	  <t>If a tunnel has a Backup Paths sub-TLV, the backup path information
	  is added to the branch, so that when the primary path fails traffic
	  can be fast-rerouted to the backup paths.
	  </t>
	</list>
	  </t>
	  <t>If a tunnel or sub-TLV is encountered but not as described above,
	  it is not treated as an error for forward compatibility.
	  It is simply ignored and MAY be logged.
    </t>
	<t>
	  If the P2MP tunnel identified by the PTA is not
	  found or active on the node, it is ignored temporarily but its replication
	  branches MUST be added when the P2MP tunnel comes up later. If the P2MP
	  tunnel identified by the PTA is removed or goes down, its corresponding
	  replication branches MUST be removed from the collection.
    </t>
    </section>
    <section title="Installing Forwarding State" anchor="route">
	  <t>The above procedures build a nexthop to be pointed to by some label
	  or (x,g) routes. The routes are determined by checking the tree
	  identification in the NLRI and tunnels in the TEA.
	  </t>
	  <!--t>If the tree is IP multicast, one and only one of the
	  tunnels MUST have a RPF sub-TLV, except when the node is the root of
	  a bidirectional tree - any of the tunnels MUST NOT have the RPF sub-TLV
	  in that case.
	  The tunnel with the RPF sub-TLV is referred to as a RPF tunnel.
	  </t>
	  <t>If the tree is P2MP/MP2MP tree, one and only one of the
	  tunnels MUST have a RPF sub-TLV except when this tree node is the root
	  and there is no binding SID assigned for it. If there is a binding SID
	  for the tree on the root node, it is signaled as a tunnel with RPF
	  sub-TLV and Receiving MPLS Label Stack sub-TLV.
	  </t-->
	  <t>If the tree is a bidirectional (*,g) IP multicast, a (*,g) route
	  is installed, pointing to the previously constructed nexthop.
	  </t>
	  <t>If the tree is a unidirectional (x,g) IP multicast, one and only one
	  of the tunnels
      MUST have the RPF sub-TLV (referred to as the RPF tunnel) with a Tunnel
      Egress Endpoint sub-TLV with a local interface address.
	  If it has no Receiving MPLS Label Stack sub-TLV, an (x,g) route
	  MUST be installed with the corresponding interface as the expected
	  incoming interface and the route points to the previously constructed nexthop.
	  The route MAY be installed even if there is a receiving MPLS Label Stack
	  sub-TLV in the tunnel - this is to allow native IP multicast packets
	  to be put onto the tree at this node. If the tunnel also has a Backup
	  Paths sub-TLV, the backup tunnels are examined and the corresponding
	  interfaces are added as the backup incoming interfaces. Traffic may
	  be accepted on both the primary and backup interfaces, or only on the
	  primary interface until it goes down, depending on the P-bit in the
	  flags field of the Backup Paths sub-TLV.
	  </t>
	  <t>
      If the tree is unidirectional, only one of the tunnels MAY contain
      a Receiving MPLS Label Stack sub-TLV. If it is bidirectional,
      multiple tunnels MAY contain the Receiving MPLS Label Stack sub-TLV.
	  The sub-TLV MUST contain one or two or three labels
	  (<xref target="labels"/>).
      For each tunnel with the Receiving MPLS Label Stack sub-TLV:
    <list style="symbols">
	  <t>If the sub-TLV includes only one label (which is allocated from
	  SRGB or the node's SRLB), a label forwarding entry for that label is
	  installed in the default label forwarding table, pointing at the
	  previously constructed nexthop.
	  </t>
	  <t>If the sub-TLV includes two labels and the first label is locally
	  allocated for a label forwarding table, a label forwarding entry for the
	  second label is installed in the label forwarding table for which the
	  first label is allocated, pointing to the previously constructed nexthop.
	  </t>
	  <t>If the sub-TLV includes two labels and the first label is for a particular
	  neighbor, a label forwarding entry for the first label is installed
	  in the default label forwarding table with the forwarding behavior
	  "pop the label and save it for later comparison", and
	  a label forwarding entry for the second label is installed in the default
	  label forwarding table, pointing to the previously constructed nexthop, with
	  additional RPF check such that packets are forwarded only if there was a
	  popped and saved preceding label matching the first (neighbor-identifying)
	  label in the sub-TLV.
	  </t>
	  <t>If the sub-TLV includes three labels and the first label is locally
	  allocated for a label forwarding table, a Label forwarding entry for the
	  second label is installed in the label forwarding table identified by the
	  first label with the forwarding behavior "pop the label and save it for
      later comparison", and a label forwarding entry for the third label is
      installed in the label forwarding table identified by the first label,
      pointing to the previously constructed nexthop, with
	  additional RPF check such that packets are forwarded only if there was a
	  popped and saved preceding label matching the second
	  (neighbor-identifying) label in the sub-TLV.
	  </t>
	  <t>If the tunnel has
	  a Backup Paths sub-TLV, the backup tunnels in the sub-TLV are examined
	  and appropriate forwarding states are installed so that either the traffic
	  from both the primary path and backup paths is forwarded or only
	  the traffic from the primary path is forwarded until the primary path
	  goes down, depending on the P-bit in the Backup Paths sub-TLV.
	  </t>
	</list>
	  </t>
	  <t>If a situation arises as not covered by the above rules, the TEA is
	  considered semantically incorrect and a negative
	  acknowledgement MUST be sent back to the controller - see
	  <xref target="ack"/>.
	  </t>
    </section>
    <section title="Acknowledgement to Controller" anchor="ack">
	  <t>After processing a received Replication State Route, the node MUST
	  send an acknowledgement back to the controller. It originates a route
	  with the same NLRI, except that the Originating Router's IP Address
	  is set to match the Tree Node's IP Address. It attaches a IP Address
	  Specific Route Target, with the Global Administration Field set to match
	  the Originating Router's IP Address in the receive route, and
	  the Local Administration Field set to 0.
	  </t>
	  <t>If the processing is not successful (e.g., due to unsupported tunnels
	  or missing/conflicting/inappropriate sub-TLVs in the TEA),
	  an MCAST NACK Extended Community MUST be attached.
	  </t>
    </section>
    </section>
    </section>
    <section anchor="security" title="Security Considerations">
      <t>This document does not introduce new security implications beyond
	  typical/general BGP-based controller-to-node signaling of forwarding state.
      </t>
    </section>
    <section title="IANA Considerations">
	  <t>IANA has assigned the following code points:
      <list style="symbols">
		<t> "Any-Encapsulation" tunnel type 78 from "BGP Tunnel Encapsulation Attribute Tunnel Types" registry
		</t>
		<t> "RPF" sub-TLV type 124 and "Tree Label Stack" sub-TLV type 125 from "BGP Tunnel Encapsulation Attribute sub-TLVs" registry
		</t>
	  </list>
	  </t>
      <t>This document makes the following additional IANA requests:
       <list style="symbols">
         <t>Assign the following tunnel types from
         the "BGP Tunnel Encapsulation Attribute Tunnel Types" registry:
		 <list style = "symbols">
		   <t>Load-balancing: TBD1</t>
		   <t>Segment List:   TBD2</t>
		 </list>
         </t>
          <t>Assign the following sub-TLV types from the
          "BGP Tunnel Encapsulation Attribute sub-TLVs" registry:
		  <list style = "symbols">
			<t>Member Tunnels: TBD3, from range 128-255</t>
			<t>Backup Paths: TBD4, from range 128-255</t>
			<t>Receiving MPLS Label Stack: TBD5</t>
		  </list>
          </t>
		  <t>Assign "Context Label TLV" type TBD6 from the "BGP-LS Node Descriptor,
		  Link Descriptor, Prefix Descriptor, and Attribute TLVs" registry.
		  </t>
          <t>Assign "Replication State" route type TBD7 from the
       "BGP MCAST-TREE Route Types" registry.
          </t>
		  <t>Create a "Tree Type Registry for MCAST-TREE Replication State Route",
		  with the following initial assignments:
		  <list style="symbols">
			<!--t>1: SR-P2MP</t-->
			<t>2: P2MP Tree with Label as Identification</t>
			<t>3: IP Multicast</t>
			<t>0x43: mLDP</t>
		  </list>
		  The registration procedure is "Standards Action".
    	  </t>
    	  <t>Assign type TBD8 from the BGP Transitive Extended Community Types
		  registry for the MCAST Extended Community.
    	  </t>
    	  <t>Create an MCAST Extended Community Sub-Type registry with the
		  following initial assignments:
            <list style="symbols">
        	  <t>0x00-0x02: Reserved
        	  </t>
        	  <t>TBD9: NACK (Negative Acknowledgement).
        	  </t>
        	</list>
			The registration procedure is Expert Review.
		  </t>
          <!--t>Assign a new BGP Community Container type
       "SR-P2MP Policy", and to create an "SR-P2MP Policy Community
       Container TLV Registry", with an initial entry for "TLV for Atoms".
          </t-->
       </list>
      </t>
    </section>
    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors Eric Rosen for his questions, suggestions, and help
         finding solutions to some issues like the neighbor-based explicit
         RPF checking. The authors also thank Lenny Giuliano, Sanoj
         Vivekanandan and IJsbrand Wijnands for their review and comments.
      </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119.xml'?>
      <?rfc include='reference.RFC.8174.xml'?>
      <?rfc include='reference.RFC.9012.xml'?>
      <?rfc include='reference.I-D.ietf-bess-bgp-multicast.xml'?>
      <?rfc include='reference.I-D.ietf-idr-sr-policy-safi.xml'?>
      <?rfc include='reference.I-D.ietf-idr-rt-derived-community.xml'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.5331.xml'?>
      <?rfc include='reference.RFC.7752.xml'?>
      <?rfc include='reference.RFC.6513.xml'?>
      <?rfc include='reference.RFC.6514.xml'?>
      <?rfc include='reference.RFC.6388.xml'?>
      <?rfc include='reference.RFC.7761.xml'?>
      <?rfc include='reference.RFC.7431.xml'?>
      <?rfc include='reference.RFC.8402.xml'?>
    </references>
  </back>
</rfc>
