Pages

Thursday, December 13, 2018

Taints and Tolerations

There are many details that we see when we describe a node or a pod. One such detail are taints and tolerations. Taints are available for nodes , if we do a describe node we can see the taints, 
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint 
Taints:              

Similarly if we do a describe pod, we can see tolerations as, 
[root@k8s-master ~]# kubectl describe pod task-pod | grep -i tolerations 
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s 

In this section we will see what taints and toleration's are.  
Taints :  There is a Node affinity concept in kubernetes which lets one to schedule a pod on a particular node. Let's say if i want to run a tomcat server pod on only nodes whose label is set to web, i can configure the node affinity element in the tomcat server pod configuration to web. So all tomcat server pods that has the node affinity set to web will be deployed and run only on that node that is set with web label. 

Now what if we want to repel the pods on a particular node. That is we don't want a node to run that pod. Taints allow a kubernetes node to repel a set of pods. If we want to deploy pods everywhere except some specific nodes we can then taint these nodes. 

A Node can be set to taint with 3 options PreferNoSchedule,NoSchedule and NoExecute, 
NoSchedule : this means that no pod will be able to schedule onto node unless that pod has a matching toleration 

NoExecute : The pod will be removed from the node if that is already running  and will not be scheduled on that node 

PreferNoSchedule - this tells the scheduler to prefer not to schedule intolerant pods on the tainted nodes. 

The taint can be set to a node as, 
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint 
Taints:             

[root@k8s-master ~]# kubectl taint nodes k8s-work-node1 experimental=true:NoExecute 
node/k8s-work-node1 tainted

[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint 
Taints:             experimental=true:NoExecute 

A taint consists of a 
key,value for the key and an effect. The key:value can be anything. A Workload can match if they specify the same key:value 
 A taint can be removed as, 
[root@k8s-master ~]# kubectl taint node k8s-work-node1 experimental:NoExecute- 
node/k8s-work-node1 untainted 
 
Tolerations : As we already discussed taints are for nodes and tolerations are for pods. In order to schedule a pod on tainted node a pod should have tolerations. Lets check the k8s-master for its taint,
[root@k8s-master ~]# kubectl describe node k8s-master | grep -i taint 
Taints:             node-role.kubernetes.io/master:NoSchedule 
Now lets take a pod running on the k8s-master and see what is their toleration, 
[root@k8s-master ~]# kubectl describe   po etcd-k8s-master -n kube-system | grep -i tolerations 
Tolerations:       :NoExecute 
The general syntax for a toleration will be, 
tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule" 
tolerations: - key: "key" operator: "Exists" effect: "NoSchedule"

The above two examples show that if a Node has a key=value or a key , the Pod will not dispatch to such a node. 

An Example, 
Lets check what our nodes have as taints, 
[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint 
Taints:              

[root@k8s-master ~]# kubectl describe node k8s-work-node2 | grep -i taint 
Taints:              
Lets taint the node1 with “machine=test:NoSchedule” key:value pair 
[root@k8s-master ~]# kubectl taint node k8s-work-node1 machine=test:NoSchedule 
node/k8s-work-node1 tainted

[root@k8s-master ~]# kubectl describe node k8s-work-node1 | grep -i taint 
Taints:             machine=test:NoSchedule

Lets run a simple pod and see where they are being created,
[root@k8s-master ~]# kubectl run test --image alpine --replicas 2 -- sleep 999 
deployment.apps/test created 
 [root@k8s-master ~]# kubectl get pod -o wide 
NAME                           READY STATUS  RESTARTS   AGE    IP           NODE      
test-686fbdb967-4sg7f   1/1      Running  0                38s    10.38.0.2  k8s-work-node2   
test-686fbdb967-5t8rs    1/1     Running  0               38s     10.38.0.1   k8s-work-node2

If we see the above output, we can see that all pods that we created are going to be run on the k8s-work-node2. No pod is created on the k8s-work-node1, because the node is already set with a taint. Now lets create a pod by defining the toleration as below, 
[root@k8s-master ~]# cat pod-with-tolerations.yml 
apiVersion: extensions/v1beta1 
kind: Deployment 
metadata: 
 name: testing 
spec: 
 replicas: 2 
 template: 
   metadata: 
     labels: 
       app: testing 
   spec: 
     containers: 
     - args: 
       - sleep 
       - "999" 
       image: alpine 
       name: main 
     tolerations: 
     - key: machine 
       operator: Equal 
       value: test 
       effect: NoSchedule 

Now lets check where these pods are being created,
[root@k8s-master ~]# kubectl get pods -o wide 
NAME                                READY STATUS RESTARTS  AGE    IP           NODE        
test-686fbdb967-4sg7f        1/1      Running 0                3m    10.38.0.2 k8s-work-node2
test-686fbdb967-5t8rs         1/1      Running 0               3m    10.38.0.1 k8s-work-node2
testing-75c94846dd-dc78x   1/1      Running 0               21s    10.40.0.8 k8s-work-node1
testing-75c94846dd-s7bhw  1/1      Running 0               21s    10.38.0.5 k8s-work-node2

 If you see the above output, we can see one of the pod which has toleration set to “machine=test:NoSchedule” is started on the node1. Since we have tainted the node1 with the “machine=test:NoSchedule” , the pod is able to tolerate the tainted node. Since we did not define any thing on the node2, the pods are created normally on this.
Read More