AKS Architecture Explained: A Deep Dive Into Azure Kubernetes Service Internals

Most architects think of Azure Kubernetes Service as a black box. You click a button, wait a few minutes, and a fully working Kubernetes cluster appears. It feels straightforward - until you need to understand how the platform actually works.

Whether you are troubleshooting an application or explaining behaviour to leadership, knowing what’s inside that black box makes all the difference.

So let’s open it up and walk through what’s really happening behind the scenes in AKS.

The Hidden Complexity Behind AKS

Here’s the thing about AKS: it looks simple from the outside. The Azure portal makes it feel like you are just spinning up another resource - no different from creating a virtual machine or a storage account. But beneath that clean experience are several layers of infrastructure, networking decisions, and a clear divide between what Azure manages and what becomes your responsibility.

AKS Architecture

A better way to think about it is the difference between buying a car and buying just the engine. When you buy a full car, all the engineering is already sorted - the engine connects to the transmission, the cooling system is in place, and the wiring is done. You simply turn the key.

AKS works in a similar way. Microsoft manages the “engine” - the control plane - so you don’t have to deal with its internal complexity. But the parts you do control still matter. Think of the tyres you choose, the route you take, the way you drive and the regular servicing you schedule. Your choices determine how the car performs in real situations.

So while AKS gives you a well-built engine, the performance and reliability of the whole system still depend on the decisions you make around it.

The Control Plane: Managed and Mostly Hidden

The control plane is where everything truly happens - and it’s also where most of the understanding gap begins. In a traditional Kubernetes setup, you can see the control plane nodes, access them, and understand exactly what they’re doing. With AKS, that entire layer is abstracted away. Azure manages it for you.

AKS Control Plane

API Server
This is the main door of your cluster. Every kubectl command, every deployment tool, and every monitoring system talks to the API server first. It handles all incoming requests and ensures they are legitimate and well-formed. It:

Authenticates the request - is this user who they say they are?
Authorises the request - do they have permission?
Validates the request - is this YAML even valid?

Once these checks pass, the API server writes the change to etcd. It’s also the only component that communicates with etcd directly. Think of it as the bouncer, the receptionist, and the record keeper all rolled into one.

Scheduler
When you create a pod, the scheduler decides which node should run it. It evaluates several factors before making that choice:

Resource requests - what the pod needs (e.g., 2 cores, 4 GB RAM)
Node capacity - what each node currently has available
Taints and tolerations - whether the pod is allowed to run on certain specialised nodes
Affinity rules - pods that should be placed close together
Anti-affinity rules - pods that should be spread apart

The scheduler itself never starts the pod. Instead, it updates the pod’s spec in etcd (via the API Server) with an instruction like “run this on node-3.” The kubelet on node-3 watches the API Server for assigned pods, notices the new assignment, and then performs the actual work of creating the pod.

Controller Manager
The Controller Manager runs a collection of controllers, each responsible for watching a specific Kubernetes resource and ensuring that the actual state of the cluster matches the desired state. Some key examples include:

Deployment controller - watches Deployment objects and creates or updates ReplicaSets
ReplicaSet controller - ensures the correct number of pods are running
Node controller - monitors node heartbeats and marks nodes NotReady when they fail

If you request 5 replicas but only 4 are running, the ReplicaSet controller creates the missing one. If a node stops responding, the Node controller marks it down and other controllers handle rescheduling pods elsewhere.

All of this works through a continuous reconciliation loop: observe the current state → compare it to the desired state → take corrective action if they differ → repeat.

Cloud Controller Manager
The Cloud Controller Manager is the Azure-specific bridge between Kubernetes and the underlying cloud platform. Whenever Kubernetes needs cloud infrastructure, this component handles the interaction with Azure APIs. For example:

LoadBalancer Services → it provisions and configures an Azure Load Balancer
PersistentVolumeClaims (PVCs) → when a PVC requests storage, it triggers dynamic provisioning of an Azure Disk via the StorageClass, and Kubernetes creates a corresponding PersistentVolume (PV)
Node lifecycle events → it applies Azure-specific labels and taints when nodes join, and cleans up the Azure VM and related resources when nodes are removed

Without the Cloud Controller Manager, Kubernetes wouldn’t know how to create or manage Azure resources. It’s the reason Kubernetes objects - like Services and PVCs - can automatically result in Azure infrastructure being created behind the scenes.

etcd
etcd is the database that stores the entire state of your cluster. Every pod definition, every Service, every ConfigMap, every Secret - everything lives here. It’s a distributed, strongly consistent key-value store, which allows it to run across multiple nodes (and in AKS, across availability zones) while maintaining reliability and consistency.

All interactions with etcd go through the API Server:

kubectl get pods → the API Server reads from etcd
kubectl apply → the API Server writes to etcd
Controllers monitoring cluster state → read from etcd, make decisions, and write updates back

Whenever a pod fails, a node dies, or a Deployment changes, controllers use etcd as the source of truth to understand the current state and determine the required corrective action.

etcd is, quite literally, the authoritative record of your entire Kubernetes cluster.

Here’s the crucial bit: your worker nodes maintain a secure tunnel back to the control plane. This secure tunnel (powered by the Konnectivity agent) allows the API server to communicate with the kubelet, even though the nodes are in your private VNET. They continuously report status, receive instructions, and update Kubernetes on what’s actually running. But while the nodes are visible and fully under your control, the control plane itself is not. You can’t SSH into it, you can’t see the underlying VMs, and you won’t find it anywhere in your Azure subscription. Your visibility is limited to the API server endpoint and, if you enable them, diagnostic logs such as API server audit logs, scheduler logs, and controller manager logs streamed into Log Analytics or Event Hub.

Behind the scenes, the control plane runs in a multi-tenant architecture managed entirely by Microsoft. In the Standard tier, control plane components are automatically distributed across multiple availability zones, giving you higher resilience and an SLA. In the Free tier, the control plane has no SLA at all - perfectly acceptable for development or experimentation, but not something you’d want to rely on for production workloads.

Microsoft also handles etcd backups automatically, but there’s an important nuance here: these backups exist solely for control plane disaster recovery. They do not help you roll back application-level changes. If someone accidentally deletes a Deployment or pushes an incorrect configuration, restoring an etcd snapshot is not an option. That’s where GitOps workflows, version-controlled manifests, and robust CI/CD practices become essential.

Node Pools: The Compute Layer You Control

Each node in your AKS cluster is a full virtual machine. It runs an operating system (typically Ubuntu or Azure Linux), and Kubernetes components are installed as system services on that machine.

AKS Worker Nodes

System vs. User Node Pools

AKS clusters start with a system node pool, which hosts essential Kubernetes components such as CoreDNS, CNI plugins, and system DaemonSets. These nodes are meant to remain stable and lightly loaded so the cluster can function reliably.

You then add user node pools for running your actual application workloads. User node pools give you flexibility in choosing VM sizes, OS types, taints, labels, and autoscaling behaviours tailored to your apps.

So what happens if you don’t add a user node pool?
Your application pods will run on the system node pool, sharing resources with critical Kubernetes services. This can lead to:

system components getting starved for CPU or memory
degraded control plane responsiveness
DNS resolution issues (CoreDNS under load)
instability during high traffic or scaling events

For production clusters, running applications on the system node pool is strongly discouraged. A clean separation - system for Kubernetes internals, user pools for workloads - keeps the cluster predictable, resilient, and easier to operate at scale.

Lets talk about components on Node Pools:

kubelet
The kubelet is the primary node agent - essentially the foreman of the node. The API server tells the kubelet what to run (“start this pod with these specifications”), and the kubelet handles the actual execution. It pulls container images, starts and stops containers, monitors their health, and continuously reports status back to the control plane. By default, the kubelet sends a heartbeat to the API server every 10 seconds. If too many heartbeats are missed, Kubernetes marks the node as NotReady and begins evicting pods so they can be rescheduled elsewhere.

Container Runtime (containerd)
The container runtime is the engine that actually runs your containers, and in AKS that engine is containerd. When the kubelet instructs a node to start a container, containerd takes over the low-level work: pulling and unpacking image layers, setting up namespaces and cgroups, and launching the container process.

AKS moved from Docker to containerd several years ago. The shift wasn’t about removing Docker support for developers - it was about adopting a runtime that is lighter, faster, and fully aligned with Kubernetes’ Container Runtime Interface (CRI). For clusters, this means improved performance, reduced overhead, and a more consistent runtime experience across nodes.

A quick clarification: Docker is still perfectly valid for building container images. Kubernetes only removed the Docker runtime inside the cluster, not Docker tooling. Developers can continue using Dockerfiles, docker build, and Docker Desktop without any changes. Once an image is built and pushed to a registry, Kubernetes runs it using containerd. The runtime changed, not the workflow.

kube-proxy
kube-proxy is responsible for managing network rules on each node. When you create a Kubernetes Service, kube-proxy updates the node’s iptables rules (or IPVS rules in newer setups) so that traffic sent to the Service’s IP is routed to the appropriate backend pods. This is what allows a stable Service IP to “magically” reach the right container, even as pods move across different nodes.

Note: In standard configurations, kube-proxy uses iptables. However, in advanced setups using Azure CNI with Cilium, this is replaced by eBPF for higher performance.

Virtual Machine Scale Sets (VMSS)
In AKS, nodes are grouped into Virtual Machine Scale Sets - Azure’s way of managing fleets of identical virtual machines. This setup makes scaling and operations straightforward. If you need more capacity, you simply increase the VMSS instance count. When upgrading Kubernetes, AKS replaces nodes through a controlled rollout, draining and recreating them without disrupting workloads. And if your applications require different types of compute - memory-optimised VMs for data workloads, GPU VMs for machine learning, or burstable nodes for dev environments - you can create multiple node pools, each backed by its own VMSS with its own configuration.

Node Resource Group
One detail that often surprises people is the node resource group. When AKS creates your worker nodes, it places the underlying infrastructure in a separate resource group - usually named with a prefix like MC_<clusterName>_<resourceGroup>. This group contains the VM Scale Sets, disks, network interfaces, load balancers, and other components the cluster depends on. You can view everything in this resource group, but you shouldn’t modify anything directly. All changes should go through AKS itself; manually editing or deleting resources here can put your cluster into an inconsistent or unrecoverable state.

Closing Thoughts

By now, AKS should feel far less like a black box and far more like a well-engineered system with clearly defined layers. The control plane, even though it’s hidden and fully managed by Azure, has a predictable architecture and behaviour. The nodes, which are your responsibility, come with their own moving parts - kubelet, containerd, kube-proxy, VM Scale Sets - and each one plays a measurable role in how your applications actually run.

And this is the real story of AKS: it’s not “simple,” it’s simplified. Microsoft removes the burden of managing the control plane, but the rest of the system still requires deliberate design, thoughtful configuration, and a working knowledge of how Kubernetes behaves under the hood. Once you understand this division of responsibility, many architectural decisions become clearer - where to place workloads, how to scale, how to isolate components, and how to operate the cluster with confidence.

But we have only covered the foundation. The deeper challenges of running AKS in production live in the layers surrounding the compute and control plane - networking, storage, security boundaries, upgrades, identity, and observability. These are the areas where clusters either become robust, well-behaved platforms… or slowly drift into operational debt.

In the upcoming posts, we’ll explore those layers in detail. We’ll look at networking (where things invariably get complicated), storage (and why ephemeral volumes matter more than you think), the responsibility boundary between Azure and you, how to build for high availability and disaster recovery, how identity shapes control and access, how upgrades introduce controlled disruption, and how observability gives you the visibility to keep everything running smoothly.

Understanding AKS architecture is the first step. Building a reliable, scalable, secure Kubernetes environment in Azure comes next - and that’s where we’re headed. Stay tuned !

If you found this useful, tap Subscribe at the bottom of the page to get future updates straight to your inbox.

AKS Architecture Explained: A Deep Dive Into Azure Kubernetes Service Internals

AKS Architecture Explained: A Deep Dive Into Azure Kubernetes Service Internals

The Hidden Complexity Behind AKS

The Control Plane: Managed and Mostly Hidden

Node Pools: The Compute Layer You Control

Closing Thoughts

Reply

Keep Reading

The Azure Architect's Playbook