Pages

Sunday, August 5, 2018

Container Networking - 6

Pod to Pod Communication - Multiple Hosts
K8s does not impose any restrictions on how pods talk to each other. The networking implementations are left for implementation. The implementations can be different but all implementations should follow 3 basic rules,
  1. All containers should be accessible to each other without NAT, regardless of which nodes they are on 
  2. All nodes should communicate with all containers
  3. The IP container should see itself the same way as the others see it
To get Route details use,
[root@manja17-I14021 ~]# netstat -r -n
Kernel IP routing table
Destination      Gateway Genmask         Flags MSS Window irtt Iface
0.0.0.0             10.131.36.1 0.0.0.0          UG        0 0           0 eth0

UG says route is Up and a Gateway. The value in the gateway which has the flag UG says that is a gateway. A gateway is a network node that connects 2 networks using different protocols together. Bridge is used to join 2 similar type of networks. The most common gateway is a router that connects a home or enterprise network to a internet
A K8s cluster consists of one or more nodes. A Node is a Physical machine which
can be physical or virtual. Each node will have the container runtime ( docker ) and other K8s components. All nodes are connected to a network that allows them to reach each other in the cluster.

The default gateway or router is 10.131.36.1. The left node has a ethO available
and dockerO with address 172.17.0.1. The Pod created has an IP 172.17.0.2
which is provided by the Pause container. Because of the local routing rules
created when the bridge is setup , any packet arriving at ethO with the destination
address of 172.17.0.2 will be forwarded to bridge which will then send to the
VethO.

The other host has too ethO with a docker bridge ( 172.17.0.1). The dockerO
address is also same from the host1 since we left these allocations to docker.
Overlay network comes into effect here.K8s will assign an overall address space
to bridges and assign a bridge address from the allocated space in each host. It
also makes sure to add routing rules to the gateway telling how packets should
be passed from one bridge to other. Overlay network makes this happen by using a combination of virtual network interfaces , bridges and routing rules.

This would work in the same way as how Overlay networking works in 2
containers talking to each other that are running in 2 different nodes. Lets create 2 pods on 2 different nodes as,
[root@manja17-I13330 kubenetes-config]# kubectl get pods
NAME                    READY STATUS RESTARTS   AGE
testing-service-9r9kt   1/1 Running 0              6s
testing-service-pxsj9   1/1 Running 0              6s

[root@manja17-I13330 kubenetes-config]# kubectl get pods -o wide
NAME                       READY STATUS RESTARTS   AGE IP NODE
testing-service-9r9kt  1/1 Running 0            44m 10.38.0.1 manja17-i14021
testing-service-pxsj9  1/1 Running 0            44m 10.40.0.3 manja17-i14022

We can see both pods are created on 2 different nodes and each pod has 2
different IP address set to them. If we login to a pod,
[root@manja17-I13330 kubenetes-config]# kubectl exec testing-service-9r9kt -it -- bash
root@testing-service-9r9kt:/usr/src/app# hostname -I
10.38.0.1

root@testing-service-9r9kt:/usr/src/app# ping -c 2 10.40.0.3
PING 10.40.0.3 (10.40.0.3): 56 data bytes
64 bytes from 10.40.0.3: icmp_seq=0 ttl=64 time=0.458 ms
64 bytes from 10.40.0.3: icmp_seq=1 ttl=64 time=0.458 ms
--- 10.40.0.3 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.458/0.458/0.458/0.000 ms

We are in the Pod 1 with IP “10.38.0.1” and pinged the Other Pod IP “10.40.0.3”
which is available in other node. Though both belongs to different private address ,
how can they talk to each other?. Do we have any network interface available with
these address ranges that are set for the Pod?. are we using the same dockerO
bridge for creating pods in different nodes?

Weave allows the inter pod communication between different hosts by providing
a overlay software defined network ( SDN ).  Weave gives each host a different
IP subnet range. The docker daemon will then assign ip to the containers from this range. The containers then talks to each other by using their unique IP address by means of packet encapsulation.

Imagine that you have two containers, Container A and Container B. Container A is placed on Host Machine A, and Container B is placed on Host Machine B. When Container A wants to talk to Container B, it will use container B’s IP address as the destination address of his packet. This packet will then be encapsulated with an outer UDP packet between Host Machine A and Host Machine B, which will be sent by Host Machine A, and that will have Host Machine B’s IP address as the destination address. Once the packet arrives to Host Machine B, the encapsulation is removed and the packet is routed to the container using the inner IP address. This configuration regarding the container/Host Machine mapping is stored in etcd. The routing is done by the weave.

Pod - Service Communication
A Pod can be created or deleted at any time. It is not feasible to lets customers access their application using the Pod IP.  How can we access the application though pods are created and deleted?

Service is the answer. The K8s service is an abstraction to define a set of pods by label selectors. Once these pods are labeled and a service is created with these pods with assigned labels, we can access the applications running inside the pods using the service. But how does the networking happens from pod to service?. What happens when we access the service will hit any one of the pod back?

IPtables is the answer for this. All the magic of sending the request from service to pod will be taken care by iptables. Lets create a pod and service and see what happens?
[root@manja17-I13330 kubenetes-config]# kubectl create -f testing-service-pod.yml
replicationcontroller "testing-service" created


[root@manja17-I13330 kubenetes-config]# kubectl create -f testingService-service.yml
service "simpleservice" created

[root@manja17-I13330 kubenetes-config]# kubectl get pods -o wide
NAME                     READY STATUS RESTARTS  AGE IP NODE
testing-service-588bm  1/1 Running 0        19s 10.40.0.5 manja17-i14022
testing-service-cm2vx   1/1 Running 0        19s 10.38.0.2 manja17-i14021


[root@manja17-I13330 kubenetes-config]# kubectl get svc
NAME             TYPE CLUSTER-IP     EXTERNAL-IP PORT(S) AGE
kubernetes       ClusterIP 10.96.0.1       <none> 443/TCP 8d
simpleservice    ClusterIP 10.97.208.63   <none> 80/TCP 9s

Now access the service using

[root@manja17-I13330 kubenetes-config]# curl 10.97.208.63:80/info
{"host": "10.97.208.63", "version": "0.5.0", "from": "10.32.0.1"}

When ever we create a service in pod with multiple pods , iptables add multiple
routes in the table to define how to send requests from service to back end pods.
Once after creating the above pods and service ,check the ip tables using
“iptables -t nat -nL” command. We can some of the below routes being created,


Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-SVC-EZC6WLOVQADP4IAW  tcp -- 0.0.0.0/0   10.97.208.63 /* default/simpleservice: cluster IP */ tcp dpt:80

Chain KUBE-SVC-EZC6WLOVQADP4IAW (1 references)
target     prot opt source               destination
KUBE-SEP-VQICGOOWNWBZ3CBM  all -- 0.0.0.0/0   0.0.0.0/0 /* default/simpleservice: */ statistic mode random probability 0.50000000000
KUBE-SEP-IVOYE2QLMRQWJMK6  all -- 0.0.0.0/0   0.0.0.0/0 /* default/simpleservice: */

Chain KUBE-SEP-IVOYE2QLMRQWJMK6 (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all -- 10.40.0.5            0.0.0.0/0 /* default/simpleservice: */
DNAT       tcp -- 0.0.0.0/0            0.0.0.0/0 /* default/simpleservice: */ tcp to:10.40.0.5:9876

Chain KUBE-SEP-VQICGOOWNWBZ3CBM (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all -- 10.38.0.2            0.0.0.0/0 /* default/simpleservice: */
DNAT       tcp -- 0.0.0.0/0            0.0.0.0/0 /* default/simpleservice: */ tcp to:10.38.0.2:9876


The main thing that we need here is the service expose the cluster IP to outside
traffic from the target KUBE-SVC-EZC6WLOVQADP4IAW. This has 2 custom chains
KUBE-SEP-VQICGOOWNWBZ3CBM and KUBE-SEP-IVOYE2QLMRQWJMK6.
This has a random probability 0.5 which means iptables will generate a random
number and tune it based on the probability distribution 0.5 to the destinations.


The 2 custom chaining have a DNAT target set to the corresponding Pod IP. The
DNAT target is responsible for changing the packets' destination IP address. So
when the traffic comes to service, iptables will randomly pick one of the pod to
route and modify the destination IP from the service IP to real Pod IP and transfer to the service.


This is how the Pod to Service communication happens.
Hope i Have given enough details on how container networking works.





No comments :

Post a Comment