Introduction

What is Pod Disruption Budget?

Pod disruption budget (PDB) is a resource policy that ensures high availability of the application through defining the number of pods that should always be available in a workload. Given a number of replica sets for an application, PDB will define the minimum and the maximum number of available pods.

What disruptions can cause a change to the number of pods in a node?

A disruption is an event in the cluster that can cause a pod to terminate and be unvailable.

There are two types of disruptions that can cause this:

  • Voluntary disruptions
  • Involuntary disruptions

Voluntary disruptions

These are disruptions that are triggered by the application administrator. For example:

  • Deleting a pod by a mistake.
  • Deleting a deployment by mistake etc

Involuntary disruptions

These are common events that you anticipate but cannot avoid:

  • Kernel panics
  • Hardware failure
  • Node evicting a pod because the node is out of resources
  • Cloud provider deleting a VM etc

Fields of a PDB

  • .spec.selector which shows which applications are affected by this policy.
  • .spec.minAvailable describes the minimum number of pods that will be available after an eviction is done in a node. It can be an absolute number or a percentage.
  • .spec.maxUnavailable describes the maximum number of pods that will unavailable after an eviction is done in a node. It can be an absolute number or a percentage.

Example of a PDB

Minimum available pods

In this description, we set the minimum available pods to 2.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: test-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: testapp

Maximum unavailable pods

In this description, we set the maximum unavailable pods to 1.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: test-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: testapp

Example to show PDB at work

You have a cluster running 3 nodes. On the nodes, you are running an application with 3 replica sets that are spread out of onto the 3 nodes. The 3 pods have a PDB policy in place. You have another pod, called pod-other in this example that is not part of the PDB policy but in node-1.

node-1 node-2 node-3
pod-1: available pod-2: available pod-3: available
pod-other: available

The pod 1-3 are in a deployment with a PDB policy that requires 2 of the 3 pods to be available:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-min
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: testapp

Deployment definition

apiVersion: apps/v1
kind: Deployment
metadata:
  name: testapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: testapp
  template:
    metadata:
      labels:
        app: testapp
    spec:
      containers:
      - name: hello-world
        image: hello-world

Assuming the cluster admin needs to perform a kernel update on the cluster. The admin will first try to drain the first node:

kubectl drain node-1

This will succeed immediately, setting the pod-1 and pod-other into a terminating state.

node-1 node-2 node-3
pod-1: terminating pod-2: available pod-3: available
pod-other: terminating

The deployment will notice that there are only 2 pods running and it will create another pod that will replace the pod-1. Let’s call it pod-4. It will also create another pod, let’s call it pod-y to replace the terminating pod-other Now, the cluster state will look like this:

node-1 node-2 node-3
pod-1: terminating pod-2: available pod-3: available
pod-other: terminating pod-4: starting pod-y: starting

When the pods finish terminating, the cluster looks like this:

node-1 node-2 node-3
pod-2: available pod-3: available
pod-4: starting pod-y: starting

If the admin tries to drain node-2, it will fail. This is because, the policy requires that at least 2 nodes are available. Eventually, pod-4 is available and the cluster looks like this:

node-1 node-2 node-3
pod-2: available pod-3: available
pod-4: available pod-y: available

The admin will try again to drain node-2. If we assume an order of stopping the pods as pod-2 and then pod-4, the drain will succeed to terminate pod-2 but it will be stopped from terminating pod-4. This is because, the policy only allows a minimum of 2 pods to be available.

node-1 node-2 node-3 no-node
pod-2: terminating pod-3: available pod-5: pending
pod-4: available pod-y: available

At this point, the admin needs to add another node to the cluster to continue the upgrade.

Conclusion

PDB will enable your application to have high availability by ensuring that there are always pods running during cluster maintainance.

You can comment on this article by clicking on this link.