Understanding Taints, Tolerations, and Node Affinity in Kubernetes: Day 14/15 of 40daysofkubernetes

Introduction

In Kubernetes, efficient resource management and optimal scheduling of Pods are crucial for maintaining a well-functioning cluster. However, simply relying on default scheduling policies is often insufficient for more complex workloads and environments. This is where Taints, Tolerations, and Node Affinity come into play. Let's explore these concepts in detail to understand how they can be effectively leveraged in a Kubernetes cluster.

Taints and Tolerations

Taints and Tolerations work together to ensure that Pods are not scheduled onto inappropriate nodes. Taints are applied to nodes, and tolerations are applied to Pods. This mechanism provides a way to repel Pods from nodes unless they explicitly tolerate the taints.

Taints

A taint is a key-value pair with an effect that is applied to a node. The key-value pair can represent any condition or attribute, and the effect determines what happens to Pods that do not tolerate the taint. There are three possible effects:

NoSchedule: The Pod will not be scheduled on the node unless it tolerates the taint.
PreferNoSchedule: The system will try to avoid placing a Pod that does not tolerate the taint on the node, but it is not a hard requirement.
NoExecute: The Pod will be evicted if it is already running on the node and does not tolerate the taint.

Example of Applying a Taint

To apply a taint to a node, you use the kubectl taint command. For example, to taint a node named node1 with the key key1, value value1, and effect NoSchedule:

kubectl taint node node1 key1=value1:NoSchedule

we have three nodes: 1 control plane and 2 worker nodes. We taint the two worker nodes with a gpu=true key-value pair. When we create a pod, it shows a pending state.

When we describe our pod using below command

kubectl describe pod/<pod-name>

The reason behind our pod not being scheduled on the node is that our control node has the taint {node-role.kubernetes.io/control-plane: }, meaning only control-component pods will schedule on this node. Our worker nodes have the taint {gpu=true} that we specified, and these nodes are looking for a pod with the toleration {gpu=true}.

Tolerations

A toleration is applied to a Pod to indicate that it can tolerate specific taints. This is done by adding a toleration section to the Pod's specification.

Example of Applying a Toleration

Let's apply toleration to our pod. Create a file pod1.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: redis
  name: redis
spec:
  containers:
    - image: redis
      name: redis
  tolerations:
    - key: "gpu"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

In this YAML file, we have made a toleration on our redis pod with a key-value pair of gpu=true. When we apply this file (kubectl apply -f pod1.yaml), this pod will schedule on an untainted node or a tainted node with a key-value pair of {gpu=true} and the pod will be in running status.

Important Points to Remember about Taints and Tolerations

Taints are set on Nodes.
Tolerations are set on Pods.
Tainted nodes will only accept pods that have a similar toleration set.
A pod (with or without a particular toleration value) may be scheduled on an untainted node.

In essence, taints on nodes will repel the pods away if the toleration doesn’t match the taint. However, nodes that do not have any taints will accept any pod (with or without toleration set on them).

NodeSelector

nodeSelector is the simplest form of node selection constraint in Kubernetes. It is used to specify a key-value pair that must match the labels on a node for a Pod to be scheduled on that node.

Characteristics:

Simple and straightforward to use.
Only supports equality-based requirements.
It is a hard constraint, meaning if no node matches the specified labels, the Pod will remain unscheduled.

Example

Create a YAML file named pod2.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: redis
  name: redis-new
spec:
  containers:
    - image: redis
      name: redis-new
  nodeSelector:
    disktype: "ssd"

Apply this YAML file and you will see that the pod will be in a pending state because the pod will search for a label disktype=ssd on the nodes.

When you describe the pod using kubectl describe pod <pod-name>, it will provide more information.

Now, label one of the worker nodes with disktype=ssd:

kubectl label node <node-name> disktype=ssd

You will see that the pod will start running on the node with the disktype=ssd label.

Node Affinity

Node Affinity is a feature in Kubernetes that allows you to constrain which nodes your Pods are eligible to be scheduled on based on node labels. It provides more flexible and expressive ways to influence Pod placement compared to nodeSelector.

Characteristics:

More expressive and flexible than nodeSelector.
Supports a broader range of operators (e.g., In, NotIn, Exists, DoesNotExist).
Can define both hard and soft constraints.

Types of Node Affinity:

requiredDuringSchedulingIgnoredDuringExecution: This type is a hard requirement, similar to nodeSelector. The Pod will only be scheduled on nodes that match the specified criteria.
preferredDuringSchedulingIgnoredDuringExecution: This type is a soft preference. The scheduler will try to place the Pod on nodes that match the criteria, but it is not mandatory.

Example

First, unlabel the nodes using the following command:

kubectl label node <node-name> <label-name>-

In our case:

kubectl label node cka-cluster-worker disktype-

Using requiredDuringSchedulingIgnoredDuringExecution

Create a file named affinity.yaml and paste the following content in it:
```
 apiVersion: v1
 kind: Pod
 metadata:
   labels:
     run: redis
   name: redis1
 spec:
   containers:
     - image: redis
       name: redis1
   affinity:
     nodeAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
         nodeSelectorTerms:
           - matchExpressions:
               - key: disktype
                 operator: In
                 values:
                   - ssd
```
In the above YAML file, you see that we add an affinity section which has a label disktype and operation In and it may have multiple values. For this example, we will take only ssd.

Now, when you apply this file, you will see that your pod is in a pending state because no node has the disktype label.

But when you label any node, you will see that the pod is running on that node that has the label.

So, in requiredDuringSchedulingIgnoredDuringExecution, the Pod will only be scheduled on nodes that match the specified criteria.

Using preferredDuringSchedulingIgnoredDuringExecution

Create a file affinity2.yaml and paste the following content in it:

 apiVersion: v1
 kind: Pod
 metadata:
   labels:
     run: redis
   name: redis2
 spec:
   containers:
     - image: redis
       name: redis2
   affinity:
     nodeAffinity:
       preferredDuringSchedulingIgnoredDuringExecution:
         - weight: 1
           preference:
             matchExpressions:
               - key: disktype
                 operator: In
                 values:
                   - hdd

In the above YAML file, we use a label disktype with a value hdd. Now in this case, the pod will first search for a node that has the same label. If it doesn't find this label, it will schedule on any of the available nodes.

Apply this file:

 kubectl apply -f affinity2.yaml

In this case, you will see that our redis2 pod is running on a worker node that has no label.

So, in preferredDuringSchedulingIgnoredDuringExecution, the scheduler will try to place the Pod on nodes that match the criteria, but it is not mandatory.

In both parts, IgnoredDuringExecution is common. This means that if you unlabel any of the nodes, it will not affect the running pod; it will affect the new pods. The pod will remain in the running state.

Important points to remember about Node Affinity

Nodes are labeled.
Affinity is a property on a pod specified in the pod specification/manifest file.
Pods that have an affinity specified will be scheduled on the nodes that are labeled with the same value.
A pod that does not have affinity specified might get scheduled on any nodes irrespective of whether the nodes are labeled.

In essence, node affinity is a property on a pod that attracts it to a labeled node with the same value. However, pods that do not have any affinity specified might get scheduled on any nodes irrespective of whether the nodes are labeled.

Conclusion

Often, one of Taints and Tolerations or Node Affinity might be enough to schedule the pods on the nodes of our choice. But if your requirement is complex, consider applying both concepts.

Thank you for reading my blog! If you have any questions, please comment, and I will make sure to answer them.