Blog Post

Azure Networking Blog
3 MIN READ

High-Scale Kubernetes Networking with Azure CNI Powered by Cilium

ShreyaJ's avatar
ShreyaJ
Icon for Microsoft rankMicrosoft
Apr 23, 2025

Kubernetes users have diverse cluster networking needs, but paramount to them all are efficient pod networking and robust security features. Azure CNI (Container Networking Interface) powered by Cilium is a solution that addresses these needs by blending the capabilities of the Azure CNI’s control plane and Cilium’s eBPF dataplane. 

Cilium enables performant networking and security by leveraging the power of eBPF (extended Berkeley Packet Filter), a revolutionary Linux kernel technology. eBPF enables the execution of custom code within the Linux kernel, providing both flexibility and efficiency. This translates to: 

  • High-performance networking: eBPF enables efficient packet processing, reduced latencies and improved throughput 
  • Enhanced security: Azure CNI (AzCNI) powered by Cilium enables network security policies based on DNS to easily manage and secure network traffic through Advanced Network Security features. 

Introducing CiliumEndpointSlice

A performant CNI dataplane is crucial for low-latency, high-throughput pod communication, enhancing distributed application efficiency and user experience. While Cilium’s eBPF-powered dataplane provides high-performance networking today, we sought to further enhance its scalability and performance. To do this, we looked at enabling a new feature in the dataplane’s configuration, CiliumEndpointSlice, thereby achieving: 

  • Lower traffic load on the Kubernetes control plane, leading to reduced control plane memory consumption and improved performance 
  • Faster pod start-up latencies 
  • Faster in-cluster network latencies for better application performance 

In particular, this feature improves upon how Azure CNI powered by Cilium manages pods. Previously, Cilium managed pods using Custom Resource Definitions (CRDs) called CiliumEndpoints. Each pod has a CiliumEndpoint associated with it. The CRD contains information about a pod’s status and properties. The Cilium Agent, a core component of the dataplane, runs on every node and watches each of these CiliumEndpoints for information about updates to pods. We have observed that this behavior can place significant stress and load on the control plane, leading to performance bottlenecks, especially for larger clusters. 

To alleviate load on the control plane, we are bringing in CiliumEndpointSlice, a feature which batches CiliumEndpoints and their associated updates. This reduces the number of updates propagated to the control plane. Consequently, we greatly reduce the risk of overloading the control plane at scale, ensuring smoother operation of the cluster. 

Performance Testing 

We have conducted performance testing of Azure CNI powered by Cilium with and without CiliumEndpointSlice enabled. The testing was done on a cluster with the following dimensions: 

  • 1000 nodes (Standard_D4_v3) 
  • 20,000 pods (i.e. 20 pods per node) 
  • 1 service with 4000 backends 
  • 800 services with 20 backends each 

The test involved repeating the following actions 10 times: creating deployments and services, restarting deployments, and deleting deployments and services. We detail the various performance metrics measured below. 

 

Average APIServer Responsiveness 

This metric measures the average latency of responses to LIST requests (one of the most expensive types of requests to the control plane) by the kube-apiserver. With CiliumEndpointSlice enabled, we observed a remarkable 50% decrease in latency, dropping from an average of ~1.5 seconds to ~0.25 seconds! For cluster users, this means much faster processing of queries sent to the kube-apiserver, leading to improved performance.

 

Pod Startup Latencies 

This metric measures the time taken for a pod to be reported as running. Here, an over 60% decrease in pod startup latency was observed with CiliumEndpointSlice enabled, allowing for faster deployment and scaling of applications. 

 

In-Cluster Network Latency 

This is a critical metric, measuring the latency of pings from a prober pod to a server. An over 80% decrease in latency was observed. This reduction in latency translates to better application performance.

 

 

Azure CNI powered by Cilium offers a powerful eBPF-based solution for Kubernetes networking and security. With the enablement of CiliumEndpointSlice from Kubernetes version 1.32 on AzCNI Powered by Cilium clusters, we see further improvements in application and control plane performance. For more information, visit https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium.

Published Apr 23, 2025
Version 1.0
No CommentsBe the first to comment