Securing hybrid network - criteria and requirements

Internet-Draft	Securing hybrid network criteria and req	November 2024
OIWA	Expires 8 May 2025	[Page]

Abstract

This document analyzes requirements for ensuring and monitoring the security status of the network used under complex network environment such as hybrid cloud or mixed cloud settings.¶

Recently, virtualized resources such as cloud computing infrastructure rapidly replace traditional types of network/computing environment such as local servers or on-premise computer clusters. In such kind of infrastructure, information of physical resources such as servers, local network, network routers, etc. are hidden from users in trade with flexibility, service redundancy and costs as well. Cryptographic communications such as TLS, IPsec etc. are typically used to protect communication into/out of such systems from eavesdropping and tampering.¶

However, there are many use cases where service still depends on the security nature of underlying physical resources, instead of just encrypting the communication:¶

Traffic analysis on encrypted communication may reveal partial information of the payload;¶
Juridical requirement (such as personal data protection) demands some specific property (such as governing laws, geological positions, operators) to be checked;¶
Denial-of-service and several other attacks may not be prevented by encryption only.¶

For such high-security applications, we need some technical infrastructure for continuously checking the properties and statuses of underlying network and intermediate nodes. In non-virtualized, self-managed setting, tHere are several existing technologies (e.g. NETCONF, path validation, etc.) for acquiring such statuses. However, these are not enough for virtualized, multi-stake-holder setting of modern cloud infrastructure.¶

This document gives a first-stage problem analysis for ensuring and monitoring the security status of the network used under complex network environment such as hybrid cloud or mixed cloud settings. It also proposes a brief, straw-man view on the enabling architecture for possible monitoring systems.¶

1.1. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

2. Background

2.1. Multi-cloud and hybrid cloud systems

Concepts of multi-cloud and hybrid clouds are defined in ISO/IEC 5140:2024; in short, multi-cloud is a system where a single service is implemented using two-or-more independently-operated cloud services. Hybrid cloud system composes two or more computation environments having different nature of operation, security level or other aspects, at least one of which is typically a public cloud service. Often, subsystems on privately-operated cloud, on-premise, or edge networks are connected with public cloud infrastructure by network to construct a single hybrid cloud system.¶

Hybrid cloud systems are, in general, constructed when the security or other provisions of public cloud systems are not sufficient for a part of information or a subsystem component (if not, a simple public or multi cloud environment is sufficient). At the same time, there are often a requirement where some benefits (scalability, costs, resilience, maintainability etc.) of public cloud systems are beneficial (if not, simple on-premise deployment is enough). This mixed, seemingly conflicting requirements makes it difficult to ensure the monitoring of security for the hybrid cloud systems.¶

2.2. Security implications of hybrid clouds

Multi-cloud and hybrid cloud systems require system-internal communications flowing beyond the boundary of single cloud systems. In a simplest case, it can be implemented using authenticated TLS or HTTPS communications via public Internet infrastructure. For high-security systems, it is often implemented using dedicated channels of communications, such as VPNs, private peering, or even a dedicated optical fiber channels. To maintain the security of whole systems, monitoring integrity of such dedicated channels is mandatory.¶

Furthermore, with IP-based software systems, there are lot more dependency to ensure such secure communications. In other words, there are a lot more surfaces for attacks. For example, if a DNS recored is either tampered or misconfigured, a communication intended to go through a secure channel might be routed to public channels. If there is a misconfiguration for routing, the traffic might go public. Enumerating and collecting status of such dependency are undermined currently.¶

3. Problem statement

There are a lot of technology already available and useful for such purposes.¶

NASR activity (Network Attestation For Secure Routing) provides capability for recording and monitoring the paths of network packets forwarding.¶
SAVNET (Source Address Validation in Intra-domain and Inter-domain Networks) provides a way to ensure validity of incoming traffic and possibly blocking any rogue packets.¶
SRv6 provides a control of intended routes for individual IPv6 packets between networks.¶
RPKI provides a control and trust anchors for the security or inter-domain routing.¶

However, to ensure the security of the whole hybrid cloud infrastructure, we still have to address the following aspects, which seems to be lacking solutions currently.¶

3.1. The nature of multiple operators/stakeholders

Hybrid cloud systems depends on a lot of resources which are not under control of the application system operators. Needless to say, public clouds (both IaaS and SaaS) are operated by external service providers. They have their own policy for their operations, and they have their own decisions for maintaining or replacing any of the providing hardware/software resources, provided that their service-level agreements (SLAs) is met.¶

This makes it non-satisfactory to expose information of all network intermediate nodes to the final application operators. First, detailed information on design and implementation of the cloud infrastructure is a confidential information and important properties of the cloud providers. Moreover, some extent of independence between application operators (users or cloud infrastructure) and cloud service providers are critical for maintaining cost effectiveness, maintainability, security etc of the cloud services.¶

3.2. Determination of the "correct" states

In a small-scale, hand-crafted network, determining whether the current running state of the network is intended or not is a relatively simple question. However, in the complex multi-cloud systems, it is quite hard or even impossible problem to determine that, even if we had been possible to know all the detail of the running state of the whole global network. To determine that, we also have to know about the design principle and hidden assumption about the operation of each single network.¶

3.3. Shared infrastructure and information leakage

The infrastructure of the cloud system is deeply shared among several clients. Although some information on the operational status at cloud service side is required to check the reliability of the user-side applications, exposing the raw operational parameters to some client may reveal security-critical information of other clients. Before exposing the cloud-side status, it must be cooked and filtered so that information only relevant to a specific client is included.¶

3.4. Virtualized infrastructure

Many cloud resources, not only computation nodes but also network routers, switches, VPN endpoints, etc., are virtualized and provided via infrastructure-as-code (IaC) systems. Unlike physical routers and switches, determination of virtual intermediate nodes in the traffic path does not mean its physical locations, physical properties and security natures. (imagine how we can analyze results of traceroute or ICMP ping via virtual private network.)¶

If there are any virtual nodes, physical properties of its underlying infrastructure may have to be traced and checked to ensure security and integrity. This requires cooperation of virtual resource provider or cloud providers and integration with their infrastructure management systems.¶

3.5. Risks beyond network layers

Today, many network systems are managed via complex systems. This means any invasion to the IT-side assets of those management systems will cause severe risks to the network layers. These assets include (and are not limited to) software asset management, software vulnerability, ID managements, etc.¶

To correctly evaluate risks of the whole network operations, we must also care about the risks of these management systems as well.¶

4. Proposed design

4.1. General

To overcome these problems, we propose to design a distributed architecture for assuring the network operation integrity for the mixed and hybrid cloud applications. Such a system should:¶

Have a modeling of the network infrastructure in two dimensions: one axis in parallel to the network paths and forwarding directions, and the other axis for the layers of protocols.¶
Have enough knowledge on the complex dependency of software and protocols; not only the network packet-forwarding technologies but also surrounding areas such as addressing and DNS must be covered.¶
Have explicit handling of tunneling and virtualization aspects, both on protocol level (e.g. VPNs, IPIP, IPSec) and on infrastructure level (IaC, Network-as-a-Service, Wavelength Division Multiplexing, etc.)¶
Consolidate operation information at each operator's level and consider their pre-determined operation principles for evaluating integrity.¶
Address management-oriented risks of the infrastructure managements, including non-network aspects.¶

Possible implementation of such a system might be distributed systems of network security coordinations between operators and users of cloud and network infrastructure. Instead of the "disclose all" approach, such a design might keep both flexibility and efficiency of the multi-cloud applications.¶

In particular, such a system will:¶

Have ability to state network security requirements from an infrastructure user to infrastructure providers. In a hybrid cloud or layered systems, it will include communications between operators of infrastructure/cloud systems.¶
Have ability to return assertions for the current provisional status against given requirements.¶
Provide some choices on the transparency levels about the internals of cloud-service infrastructure.¶
Have some traceability provisions for trouble shooting, if there are opacities in network status assertion replies.¶
Have enough considerations on various tunneling and virtualization technologies.¶
Have a bidirectional interface to system-level security management systems, such as Continuous Diagnostics and Mitigations (CDM) dashboards.¶

5. Path Characteritics Service

A service called "Path Characteristics Service" (PCS) provides a endpoint for requesting/answering assertions for the "characteristics" of the network paths. Typically it is deployed on each network operators or connectivity providers, and answering the real-time assertion of the network status for their contracted clients.¶

In the complex commercial networking, network operators provide connectivity not only for the Internet but also for other providers' dedicated services, such as public and private clouds. Also, their provided connectivity may utilize tunneling technology upon networks provided by other operators. For such multi-stakeholder settings, PCS will gather information from the PCSs of the other providers and returns the summary information to the clients.¶

(TBA: figures: see IETF 120 NARS BoF presentation)¶

The rest of this section will provide a unconsolidated list of requirements for the functionality of PCS.¶

5.1. Identification and Authentication

The PCS service will be access-controlled and confidentiality-protected.¶
The service will be authenticated e.g. by OAuth or similar strong authentication mechanisms.¶
The identity of the authentication will be associated with a single connectivity or network access channel. Some examples of the single connectivity are:¶
- Physical (layer-1) leased line,¶
- A layer-2 plane (e.g. VLAN or VXLAN) on a single layer-1 network,¶
- A virtual private network tunnel.¶

TBD: if there are multiple connectivity channel on single business contracts, multiple IDs might be associated with a single authentication.¶

5.2. Subscriptions for assertions

The protocols used by PCS will have a kind of subscription mechanisms: Clients will register a query, and will receive assertions for the query continuously or periodically.¶
For periodical polling, the intervals of assertions have to be customizable.¶

5.3. Queries

A query will be a set of assertion requests.¶
Each assertion request will contain a destination and a list of desired connectivity properties on the communication path to the specified destination.¶
A destination is a subset of the Internet, specified as an AS, a subset of IP addresses, or a DNS name.¶
Providers of PCS are not required to support all types of queries, and may pose their own limitations onto it. However, they has to make negative responses for any queries they do not support.¶

5.4. Connectivity Properties

List of desired connectivity properties declares what kind network nodes (both network nodes and edges) the communication packets will be allowed to flow over.¶

5.4.1. Properties for nodes

Possible property requests for a network node will include at least:¶

operator¶
geo-location¶
supplier¶
model¶
hardware ID¶
the name and version of the running software¶
the security status of the node¶
the security status of the operator¶
required assurance level (see below)¶

5.4.2. Properties for edges

Network edges may be categorized into:¶

A physical network edge¶
A network tunnel¶
A software-defined network¶

Possible property requests for a physical network edge will include at least:¶

operator¶
geo-location¶
the protocol type of the physical network¶
the security status of the operator¶
required assurance level (see below)¶

Possible property requests for a network tunnel will include at least:¶

operator¶
geo-location¶
(nested) path property request for the underlying network¶
the identification of the tunnel¶
the protocol type¶
the strength of the integrity/confidentiality protection¶
the security status of the tunnel¶
the name and version of the software realizing the tunnel¶
the security status of the operator¶
required assurance level (see below)¶

Possible property requests for a software-defined network will include at least:¶

operator¶
geo-location¶
(nested) path property request for the underlying network¶
the name and version of the software realizing the network¶
the security status of the network-defining software¶
the security status of the operator¶
required assurance level (see below)¶

5.5. Assertions and assurance levels

An assertion, which is a response to the query, will contain either an evidence or an guarantee of the required network properties. There will be several types of assurance levels or types of the assertions to be returned. Every response will be signed by the PCS with the identification of the PCS software.¶

5.5.1. Traced present assertions

For traced assertions, the query will typically contain a requirement for specific node suppliers and types. The answer will contain a recorded trace of the path, signed with each traversed network nodes with their identifications. The information will ensure that the property is satisfied only at the present time.¶

This type of assertion will require dedicated support for packet traces in every network nodes.¶

5.5.2. Transparent present assertions

For transparent assertions, the response will contain a list of traversed nodes and edges with their properties (as requested in the query). If the query contains requirements for networks operated by third parties (i.e. involving a cascaded queries to other PCSs), the assertion will contain sub-assertions received from the third parties. The information will ensure that the property is satisfied only at the present time.¶

5.5.3. Traceable opaque present assertions

For traceable opaque assertions, the response will contain an opaque ID for the response. That ID has to be corresponding to the trace informations which can be used by operators to identify the records for trouble-shooting in the future time. The information will ensure that the property is satisfied only at the present time.¶

5.5.4. Opaque present assertions

For opaque assertions, the response will contain just a positive or negative answer to the query. The information will ensure that the property is satisfied only at the present time.¶

5.5.5. Traceable opaque future assertions

For traceable opaque future assertions, the response will contain an opaque ID for the response. That ID has to be corresponding to the trace informations which can be used by operators to identify the records for trouble-shooting in the future time. The information will ensure that the network is controlled in the way that the require property is kept satisfied, even when dynamic routing has been changed.¶

5.5.6. Opaque future assertions

For opaque assertions, the response will contain just a positive or negative answer to the query. The information will ensure that the network is controlled in the way that the require property is kept satisfied, even when dynamic routing has been changed.¶

5.6. Things to be considered:

How to assert security level of operators¶
- Standards or de-facto standards for status sharing with security dashboards¶
Details on speficificaton for real-world properties such as operators, suppliers, models and geo-locations¶
How to integrate and monitor application-level dynamic routing (e.g. DNS)¶
Possible more-detailed specification for network topology requirements¶
More detailed integration with other NASR activities¶
Possible integration with RPKI and other global-level managements¶

6. Some examples

The following is an informal example of the possible query to the PCS operated by operator A.¶

                       Internet
                           |
+------+    +-----------------+    +-----------------+
| User |----| Operator A (FR) |    | Operator C (DE) |
+------+    | 192.0.2.64/28   |    | 198.51.100.4/30 |
            +-----------------+    +-----------------+
                    |                           |
                    `===== IPIP VPN tunnel ====='
                      via some network operator in EU

Note: Areas FR, DE, and EU are just chosen as examples
for nested areas.

The path to 192.0.2.64/28 (on operator A) will be composed of:¶
- Network nodes¶
  - within FR geo-locations,¶
  - operated by operator A,¶
  - assured for future with traceable records (by operator A).¶
- Software-defined network edges¶
  - within FR geo-locations,¶
  - operated by operator A,¶
  - assured for future with traceable records (by operator A).¶
The path to 198.51.100.4/30 (on operator C) will be composed of:¶
- Network nodes¶
  - within FR geo-locations,¶
  - operated by operator A,¶
  - assured for future with traceable records (by operator A).¶
- Network edges¶
  - within FR geo-locations,¶
  - operated by operator A,¶
  - assured for future with traceable records (by operator A).¶
- VPN connection¶
  - using IPIP transport,¶
  - with strong encryption (details TBD),¶
  - operated by operator A,¶
  - assured for future with traceable record (by operator A),¶
  - going through underlying network:¶
    - Network nodes:¶
      - within EU geo-locations.¶
      - opaquely assured for the present status (by some operator trusted by A).¶
    - Network edges:¶
      - within EU geo-locations.¶
      - opaquely assured for the present status (by some operator trusted by A).¶
- Software-defined network¶
  - within DE geo-locations,¶
  - operated by operator C,¶
  - opaquely assured for the present status (by operator C).¶