Mastering Kubernetes Server-Side Sharded List and Watch: A Step-by-Step Guide

Introduction

As Kubernetes clusters scale to tens of thousands of nodes, controllers that watch high-cardinality resources like Pods face a significant performance wall. Traditional client-side sharding forces each replica of a horizontally scaled controller to receive the full event stream from the API server, consuming CPU, memory, and network bandwidth to deserialize everything, only to discard the objects it doesn't own. Scaling out the controller multiplies this cost rather than reducing per-replica overhead. Kubernetes v1.36 introduces an alpha feature (KEP-5866) called server-side sharded list and watch, which moves the filtering upstream into the API server. This effectively splits the resource collection so each controller replica receives only the slice it owns, dramatically reducing resource waste. In this guide, you'll learn step-by-step how to implement this feature in your controllers to achieve efficient, scalable monitoring.

Mastering Kubernetes Server-Side Sharded List and Watch: A Step-by-Step Guide

What You Need

A Kubernetes cluster running v1.36 or later with the ServerSideShardedListWatch feature gate enabled (alpha features are off by default).
Access to the client-go library (or your preferred Kubernetes client library) that supports the shardSelector field in ListOptions.
Basic familiarity with Go programming and Kubernetes informers.
A horizontally scaled controller (e.g., kube-state-metrics or a custom controller) that you wish to shard.
Understanding of hash ranges (hexadecimal 64-bit values) for splitting the keyspace.

Step-by-Step Implementation

Step 1: Enable the Feature Gate

First, ensure your Kubernetes API server is running with the ServerSideShardedListWatch=true feature gate. If you use a managed cluster, check with your provider. For self-managed clusters, add --feature-gates=ServerSideShardedListWatch=true to the API server startup flags. The feature is alpha, so verify compatibility and be aware it may change in future releases.

Step 2: Understand the Shard Selector

The feature introduces a shardSelector field in ListOptions. You specify a hash range using the shardRange() function. The API server computes a deterministic 64-bit FNV-1a hash of the specified field (currently object.metadata.uid or object.metadata.namespace) and returns only objects whose hash falls within [start, end). For example:

shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')

This splits the hash space in half: replica 0 gets the lower half, replica 1 gets the upper half. You can split into any number of shards by dividing the 64-bit space equally.

Step 3: Implement Shard Range in Your Controller Code

When creating an informer factory, use WithTweakListOptions to inject the shard selector. For a two-replica deployment, each replica will use a different shard range. Below is an example for replica 0:

package main

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/informers"
)

func main() {
    client := // ... create Kubernetes client
    shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"
    factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod,
        informers.WithTweakListOptions(func(opts *metav1.ListOptions) {
            opts.ShardSelector = shardSelector
        }),
    )
    // ... start informers
}

For replica 1, use shardRange(object.metadata.uid, '0x8000000000000000', '0x0000000000000000') (the range wraps around; you can specify end > start or end < start to cover the rest). In practice, you can dynamically compute ranges based on replica index and total replicas.

Step 4: Configure Controller Replicas

Set up your controller deployment with the number of replicas matching your sharding scheme. Each replica must know its index (e.g., via environment variable or command-line flag) to compute the correct shard range. For n replicas, divide the 64-bit space into n equal intervals. For example, with n=4, intervals are [0x0, 0x4000000000000000), [0x4000000000000000, 0x8000000000000000), [0x8000000000000000, 0xC000000000000000), and [0xC000000000000000, 0x0) (with wrap). Ensure each replica uses the same hash field (recommended: object.metadata.uid for even distribution).

Step 5: Test and Verify

Deploy your sharded controller and monitor its resource usage. You should see each replica’s CPU and memory utilization drop proportionally to the shard size. Use tools like kubectl top pod or Prometheus metrics to compare before and after. Verify that objects are correctly filtered: each replica should only list/watch objects in its assigned hash range. If you use object.metadata.uid, note that the hash is based on the UID of each object, which is unique. For namespaces-based sharding, ensure your workload is evenly distributed across namespaces. Debug by enabling verbose logging in the controller to see the opts.ShardSelector value.

Tips and Best Practices

Choose the right hash field: object.metadata.uid provides the best distribution for most workloads. Use object.metadata.namespace if your controller already partitions by namespace and you have many namespaces.
Handle rebalancing: If you change the number of replicas, each replica must update its shard range. This causes a re-list of the API server for the new range. Ensure your controller handles this gracefully by restarting informers with new options.
Avoid overlapping shards: Each replica should have a unique, non-overlapping range. Overlapping ranges duplicate work and waste resources.
Monitor for hash collisions: The FNV-1a hash is deterministic; collisions are extremely rare but possible. This feature does not include a mechanism to retry missed events – ensure your controller tolerates eventual consistency.
Performance impact on API server: The API server does the hashing concurrently. In large clusters, enabling this feature may increase API server CPU slightly, but it reduces network and processing load on controllers.
Upgrade and rollback: Since this is alpha, test thoroughly in non-production first. If you need to disable the feature, informers will fall back to full list/watch. Prepare your code to handle both scenarios.

With these steps, you can effectively implement server-side sharded list and watch in your Kubernetes controllers, achieving linear scaling without wasted overhead. Start by enabling the feature gate, then integrate the shard selector into your informer setup, and carefully configure replica ranges. Your controllers will thank you for the reduced resource consumption.

Tags: