There are many details that we see when we describe a node or a pod. One such detail are taints and tolerations. Taints are available for nodes , if we do a describe node we can see the taints,
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint
Taints:
Similarly if we do a describe pod, we can see tolerations as,
[root@k8s-master ~]# kubectl describe pod task-pod | grep -i tolerations
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
In this section we will see what taints and toleration's are.
Taints : There is a Node affinity concept in kubernetes which lets one to schedule a pod on a particular node. Let's say if i want to run a tomcat server pod on only nodes whose label is set to web, i can configure the node affinity element in the tomcat server pod configuration to web. So all tomcat server pods that has the node affinity set to web will be deployed and run only on that node that is set with web label.
Now what if we want to repel the pods on a particular node. That is we don't want a node to run that pod. Taints allow a kubernetes node to repel a set of pods. If we want to deploy pods everywhere except some specific nodes we can then taint these nodes.
A Node can be set to taint with 3 options PreferNoSchedule,NoSchedule and NoExecute,
NoSchedule : this means that no pod will be able to schedule onto node unless that pod has a matching toleration
NoExecute : The pod will be removed from the node if that is already running and will not be scheduled on that node
PreferNoSchedule - this tells the scheduler to prefer not to schedule intolerant pods on the tainted nodes.
The taint can be set to a node as,
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint
Taints:
[root@k8s-master ~]# kubectl taint nodes k8s-work-node1 experimental=true:NoExecute
node/k8s-work-node1 tainted
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint
Taints: experimental=true:NoExecute
A taint consists of a key,value for the key and an effect. The key:value can be anything. A Workload can match if they specify the same key:value.
A taint can be removed as,
[root@k8s-master ~]# kubectl taint node k8s-work-node1 experimental:NoExecute-
node/k8s-work-node1 untainted
Tolerations : As we already discussed taints are for nodes and tolerations are for pods. In order to schedule a pod on tainted node a pod should have tolerations. Lets check the k8s-master for its taint,
[root@k8s-master ~]# kubectl describe node k8s-master | grep -i taint
Taints: node-role.kubernetes.io/master:NoSchedule
[root@k8s-master ~]# kubectl describe po etcd-k8s-master -n kube-system | grep -i tolerations
Tolerations: :NoExecute
tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule"
tolerations: - key: "key" operator: "Exists" effect: "NoSchedule"
The above two examples show that if a Node has a key=value or a key , the Pod will not dispatch to such a node.
An Example,
Lets check what our nodes have as taints,
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint
Taints:
[root@k8s-master ~]# kubectl describe node k8s-work-node2 | grep -i taint
Taints:
Lets taint the node1 with “machine=test:NoSchedule” key:value pair
[root@k8s-master ~]# kubectl taint node k8s-work-node1 machine=test:NoSchedule
node/k8s-work-node1 tainted
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint
Taints: machine=test:NoSchedule
Lets run a simple pod and see where they are being created,
[root@k8s-master ~]# kubectl run test --image alpine --replicas 2 -- sleep 999
deployment.apps/test created
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
test-686fbdb967-4sg7f 1/1 Running 0 38s 10.38.0.2 k8s-work-node2
test-686fbdb967-5t8rs 1/1 Running 0 38s 10.38.0.1 k8s-work-node2
If we see the above output, we can see that all pods that we created are going to be run on the k8s-work-node2. No pod is created on the k8s-work-node1, because the node is already set with a taint. Now lets create a pod by defining the toleration as below,
[root@k8s-master ~]# cat pod-with-tolerations.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: testing
spec:
replicas: 2
template:
metadata:
labels:
app: testing
spec:
containers:
- args:
- sleep
- "999"
image: alpine
name: main
tolerations:
- key: machine
operator: Equal
value: test
effect: NoSchedule
Now lets check where these pods are being created,
[root@k8s-master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
test-686fbdb967-4sg7f 1/1 Running 0 3m 10.38.0.2 k8s-work-node2
test-686fbdb967-5t8rs 1/1 Running 0 3m 10.38.0.1 k8s-work-node2
testing-75c94846dd-dc78x 1/1 Running 0 21s 10.40.0.8 k8s-work-node1
testing-75c94846dd-s7bhw 1/1 Running 0 21s 10.38.0.5 k8s-work-node2
If you see the above output, we can see one of the pod which has toleration set to “machine=test:NoSchedule” is started on the node1. Since we have tainted the node1 with the “machine=test:NoSchedule” , the pod is able to tolerate the tainted node. Since we did not define any thing on the node2, the pods are created normally on this.
test-686fbdb967-5t8rs 1/1 Running 0 3m 10.38.0.1 k8s-work-node2
testing-75c94846dd-dc78x 1/1 Running 0 21s 10.40.0.8 k8s-work-node1
testing-75c94846dd-s7bhw 1/1 Running 0 21s 10.38.0.5 k8s-work-node2
If you see the above output, we can see one of the pod which has toleration set to “machine=test:NoSchedule” is started on the node1. Since we have tainted the node1 with the “machine=test:NoSchedule” , the pod is able to tolerate the tainted node. Since we did not define any thing on the node2, the pods are created normally on this.