一次Flannel和Docker網路不通定位問題 一次Flannel和Docker網路不通定位問題
一次Flannel和Docker網路不通定位問題
檢視路由表的配置
路由表情況
[[email protected] ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.44.1 0.0.0.0 UG 100 0 0 enp0s3 10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0 10.1.19.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0 192.168.44.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
10.1.0.0為flannel0網段
而在這臺機器上啟動的pod都是在10.1.19.0網段的
node的節點路由表
[[email protected] flannel]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.44.1 0.0.0.0 UG 100 0 0 enp0s3 10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0 10.1.28.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0 192.168.44.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
10.1.0.0為flannel0網段
而在這臺機器上啟動的pod都是在10.1.28.0網段的
所有pod的ip地址
[[email protected] ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE helloworld-service-2437162702-r9v2q 2/2 Running 9 9d 10.1.28.3 node1 helloworld-service-v2-2637126738-s284c 2/2 Running 10 9d 10.1.28.4 node1 istio-egress-2869428605-2ftgl 1/1 Running 6 13d 10.1.28.6 node1 istio-ingress-1286550044-6g3vj 1/1 Running 6 13d 10.1.28.5 node1 istio-mixer-765485573-23wc6 1/1 Running 6 13d 10.1.28.7 node1 istio-pilot-1495912787-g5r9s 2/2 Running 11 13d 10.1.28.9 node1 tool-185907110-ms991 2/2 Running 4 8d 10.1.28.8 node1
正常情況下,ping pod節點的網路應該是通的
[[email protected] ~]# ping 10.1.28.3 PING 10.1.28.3 (10.1.28.3) 56(84) bytes of data. 64 bytes from 10.1.28.3: icmp_seq=1 ttl=61 time=0.967 ms 64 bytes from 10.1.28.3: icmp_seq=2 ttl=61 time=1.88 ms 64 bytes from 10.1.28.3: icmp_seq=3 ttl=61 time=0.867 ms 64 bytes from 10.1.28.3: icmp_seq=4 ttl=61 time=2.23 ms
整個通訊鏈路原理及報文追蹤
整個鏈路簡單的圖如下
比較詳細的可以參考下面這張
- 資料從源容器中發出後,經由所在主機的docker0虛擬網絡卡轉發到flannel0虛擬網絡卡,這是個P2P的虛擬網絡卡,flanneld服務監聽在網絡卡的另外一端。
- Flannel通過Etcd服務維護了一張節點間的路由表。
- 源主機的flanneld服務將原本的資料內容UDP封裝後根據自己的路由表投遞給目的節點的flanneld服務,資料到達以後被解包,然後直 接進入目的節點的flannel0虛擬網絡卡,然後被轉發到目的主機的docker0虛擬網絡卡,最後就像本機容器通訊一下的有docker0路由到達目標容 器。
所以要定位網路的不通就需要一步步的看報文是在哪處的轉發出了問題。
源端網路
首先檢視發器端的flannel0的地址
[[email protected] ~]# ifconfig docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 10.1.19.1 netmask 255.255.255.0 broadcast 0.0.0.0 ether 02:42:3a:a6:1d:bb txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.44.108 netmask 255.255.255.0 broadcast 192.168.44.255 inet6 fe80::a00:27ff:fee2:ae0a prefixlen 64 scopeid 0x20<link> ether 08:00:27:e2:ae:0a txqueuelen 1000 (Ethernet) RX packets 20866 bytes 2478600 (2.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 21990 bytes 13812121 (13.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 flannel0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1472 inet 10.1.19.0 netmask 255.255.0.0 destination 10.1.19.0 unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC) RX packets 14 bytes 1176 (1.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14 bytes 1176 (1.1 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
然後執行下面命令監聽從flannel0出去的任何的包
tcpdump -i flannel0 -nn host 10.1.19.0
同時再找個視窗ping pod,這是收到的資訊是 ping 10.1.28.3
[[email protected] ~]# tcpdump -i flannel0 -nn host 10.1.19.0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on flannel0, link-type RAW (Raw IP), capture size 65535 bytes 16:28:43.961488 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 1, length 64 16:28:43.963340 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 1, length 64 16:28:44.962567 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 2, length 64 16:28:44.963339 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 2, length 64 16:28:45.966388 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 3, length 64 16:28:45.966962 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 3, length 64 16:28:46.967629 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 4, length 64 16:28:46.968486 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 4, length 64
可以看到報文已經發出,然後看傳送端的物理網絡卡enp0s3,繼續執行ping命令,然後看有沒有轉發到物理網絡卡的包
因為是master節點,所以有很多8080,443埠發的包,可以忽略,真實環境中相對比較少.核心可以看最後為UDP,length 84,屬於把ping的包64封裝後的包的大小。
[[email protected] ~]# tcpdump -i enp0s3 -nn host 192.168.44.109 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp0s3, link-type EN10MB (Ethernet), capture size 65535 bytes 16:46:59.146611 IP 192.168.44.108.8080 > 192.168.44.109.50060: Flags [P.], seq 1518764712:1518765120, ack 650646529, win 327, options [nop,nop,TS val 4304586 ecr 7794005], length 408 16:46:59.146863 IP 192.168.44.108.443 > 192.168.44.109.51564: Flags [P.], seq 474973224:474973663, ack 3595606551, win 248, options [nop,nop,TS val 4304586 ecr 7794006], length 439 16:46:59.147013 IP 192.168.44.109.50060 > 192.168.44.108.8080: Flags [.], ack 408, win 1424, options [nop,nop,TS val 7794610 ecr 4304586], length 0 16:46:59.147301 IP 192.168.44.109.51564 > 192.168.44.108.443: Flags [.], ack 439, win 1407, options [nop,nop,TS val 7794610 ecr 4304586], length 0 16:46:59.224901 IP 192.168.44.109.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) PTR node1.local. (109) 16:46:59.259598 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 3602262654:3602262700, ack 901869271, win 1407, options [nop,nop,TS val 7794724 ecr 4297197], length 46 16:46:59.267671 IP 192.168.44.108.8285 > 192.168.44.109.8285: UDP, length 84 16:46:59.269133 IP 192.168.44.109.8285 > 192.168.44.108.8285: UDP, length 84 16:46:59.270082 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 1:66, ack 46, win 1432, options [nop,nop,TS val 4304709 ecr 7794724], length 65 16:46:59.270419 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 66:639, ack 46, win 1432, options [nop,nop,TS val 4304709 ecr 7794724], length 573 16:46:59.270734 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [.], ack 66, win 1407, options [nop,nop,TS val 7794735 ecr 4304709], length 0 16:46:59.271040 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [.], ack 639, win 1407, options [nop,nop,TS val 7794735 ecr 4304709], length 0 16:46:59.272370 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 46:94, ack 639, win 1407, options [nop,nop,TS val 7794736 ecr 4304709], length 48 16:46:59.272522 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 94:667, ack 639, win 1407, options [nop,nop,TS val 7794736 ecr 4304709], length 573 16:46:59.272743 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 667:705, ack 639, win 1407, options [nop,nop,TS val 7794736 ecr 4304709], length 38 16:46:59.278885 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [.], ack 705, win 1432, options [nop,nop,TS val 4304718 ecr 7794736], length 0 16:46:59.283084 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 639:681, ack 705, win 1432, options [nop,nop,TS val 4304722 ecr 7794736], length 42 16:46:59.283224 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 681:723, ack 705, win 1432, options [nop,nop,TS val 4304722 ecr 7794736], length 42 16:46:59.284143 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [.], ack 723, win 1407, options [nop,nop,TS val 7794748 ecr 4304722], length 0 16:46:59.287279 IP 192.168.44.108.8080 > 192.168.44.109.50060: Flags [P.], seq 408:824, ack 1, win 327, options [nop,nop,TS val 4304726 ecr 7794610], length 416 16:46:59.287584 IP 192.168.44.109.50060 > 192.168.44.108.8080: Flags [.], ack 824, win 1424, options [nop,nop,TS val 7794751 ecr 4304726], length 0
命令確認ping命令的包發到192.168.44.109
目標段網路
再去node1目標端,看物理網絡卡的收包情況,源端繼續執行ping
[[email protected] flannel]# tcpdump -i enp0s3 -nn host 192.168.44.108 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp0s3, link-type EN10MB (Ethernet), capture size 65535 bytes 16:49:04.022476 IP 192.168.44.108.8285 > 192.168.44.109.8285: UDP, length 84 16:49:04.022827 IP 192.168.44.109.8285 > 192.168.44.108.8285: UDP, length 84 16:49:05.022980 IP 192.168.44.108.8285 > 192.168.44.109.8285: UDP, length 84 16:49:05.023425 IP 192.168.44.109.8285 > 192.168.44.108.8285: UDP, length 84 16:49:05.273652 IP 192.168.44.108.8080 > 192.168.44.109.50060: Flags [P.], seq 1518824053:1518824479, ack 650646776, win 336, options [nop,nop,TS val 4430711 ecr 7919368], length 426 16:49:05.273754 IP 192.168.44.109.50060 > 192.168.44.108.8080: Flags [.], ack 426, win 1424, options [nop,nop,TS val 7920736 ecr 4430711], length 0 16:49:05.273951 IP 192.168.44.108.443 > 192.168.44.109.51564: Flags [P.], seq 475036916:475037373, ack 3595607190, win 248, options [nop,nop,TS val 4430711 ecr 7919369], length 457 16:49:05.274091 IP 192.168.44.109.51564 > 192.168.44.108.443: Flags [.], ack 457, win 1407, options [nop,nop,TS val 7920737 ecr 4430711], length 0
發現源端有包過來,正常
在目標節點node1上執行,10.1.19.0是源端的flannel0地址,正常。
[[email protected] flannel]# tcpdump -i flannel0 -nn host 10.1.19.0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on flannel0, link-type RAW (Raw IP), capture size 65535 bytes 16:51:49.795788 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 1, length 64 16:51:49.795911 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 1, length 64 16:51:50.797484 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 2, length 64 16:51:50.797566 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 2, length 64 16:51:51.796934 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 3, length 64 16:51:51.797024 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 3, length 64 16:51:52.800567 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 4, length 64 16:51:52.800641 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 4, length 64
最後看目標端docker0有沒有報文,28.3目標pod地址
tcpdump -i docker0 -nn host 10.1.28.3
問題定位
遇到的問題是目標端flannel0上有包發過來,但docker0網段沒有任何包。
所以定位是目標段的flannel0->docker0的轉發出了問題。
通過iptables -nvL 檢視現有的iptables規則,發現
chain FORWARD鏈路 policy是DROP,以下命令修改
iptables -P FORWARD ACCEPT
另外查宿主機的 ip forward是否有問題
sysctl -a | grep ip_forward
源端如何找到目標端地址
全靠flannel會找etcd的中的資料,然後進行路由
華麗的分割線
====================================================================================
服務啟動順序
Kubernetes啟動這些服務的順序非常重要
先是flannel Service
flannel服務啟動時主要做了以下幾步的工作:
- 從etcd中獲取network的配置資訊
- 劃分subnet,並在etcd中進行註冊
- 將子網資訊記錄到
/run/flannel/subnet.env
中
cat /run/flannel/subnet.env FLANNEL_NETWORK=10.0.0.0/16 FLANNEL_SUBNET=10.0.53.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=false
- 之後將會有一個指令碼將subnet.env轉寫成一個docker的環境變數檔案
/run/flannel/docker
cat /run/flannel/docker DOCKER_OPT_BIP="--bip=10.0.53.1/24" DOCKER_OPT_IPMASQ="--ip-masq=true" DOCKER_OPT_MTU="--mtu=1450" DOCKER_NETWORK_OPTIONS=" --bip=10.0.53.1/24 --ip-masq=true --mtu=1450 "
然後是Docker服務
Docker服務會根據flannel拿到的網段然後把pod啟動在這些網段,這樣Kubernetes在定址pod的時候就會找到相應的宿主機,進行通訊。
如果Docker服務和Flannel服務沒有這種關聯關係的化,很可能Docker先用原來的ip段啟動,而這個段並沒有寫到etcd中,導致定址失敗。
這就是在另一次定位問題的出錯點。
==============================================================
驗證是否開牆開通
nc -u 10.93.0.131 (host B) 8472
輸入字元
再host B上,通過tcpdump -i eth0 -nn host hostA來驗證是否能收到報文
檢視路由表的配置
路由表情況
[[email protected] ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.44.1 0.0.0.0 UG 100 0 0 enp0s3 10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0 10.1.19.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0 192.168.44.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
10.1.0.0為flannel0網段
而在這臺機器上啟動的pod都是在10.1.19.0網段的
node的節點路由表
[[email protected] flannel]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.44.1 0.0.0.0 UG 100 0 0 enp0s3 10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0 10.1.28.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0 192.168.44.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
10.1.0.0為flannel0網段
而在這臺機器上啟動的pod都是在10.1.28.0網段的
所有pod的ip地址
[[email protected] ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE helloworld-service-2437162702-r9v2q 2/2 Running 9 9d 10.1.28.3 node1 helloworld-service-v2-2637126738-s284c 2/2 Running 10 9d 10.1.28.4 node1 istio-egress-2869428605-2ftgl 1/1 Running 6 13d 10.1.28.6 node1 istio-ingress-1286550044-6g3vj 1/1 Running 6 13d 10.1.28.5 node1 istio-mixer-765485573-23wc6 1/1 Running 6 13d 10.1.28.7 node1 istio-pilot-1495912787-g5r9s 2/2 Running 11 13d 10.1.28.9 node1 tool-185907110-ms991 2/2 Running 4 8d 10.1.28.8 node1
正常情況下,ping pod節點的網路應該是通的
[[email protected] ~]# ping 10.1.28.3 PING 10.1.28.3 (10.1.28.3) 56(84) bytes of data. 64 bytes from 10.1.28.3: icmp_seq=1 ttl=61 time=0.967 ms 64 bytes from 10.1.28.3: icmp_seq=2 ttl=61 time=1.88 ms 64 bytes from 10.1.28.3: icmp_seq=3 ttl=61 time=0.867 ms 64 bytes from 10.1.28.3: icmp_seq=4 ttl=61 time=2.23 ms
整個通訊鏈路原理及報文追蹤
整個鏈路簡單的圖如下
比較詳細的可以參考下面這張
- 資料從源容器中發出後,經由所在主機的docker0虛擬網絡卡轉發到flannel0虛擬網絡卡,這是個P2P的虛擬網絡卡,flanneld服務監聽在網絡卡的另外一端。
- Flannel通過Etcd服務維護了一張節點間的路由表。
- 源主機的flanneld服務將原本的資料內容UDP封裝後根據自己的路由表投遞給目的節點的flanneld服務,資料到達以後被解包,然後直 接進入目的節點的flannel0虛擬網絡卡,然後被轉發到目的主機的docker0虛擬網絡卡,最後就像本機容器通訊一下的有docker0路由到達目標容 器。
所以要定位網路的不通就需要一步步的看報文是在哪處的轉發出了問題。
源端網路
首先檢視發器端的flannel0的地址
[[email protected] ~]# ifconfig docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 10.1.19.1 netmask 255.255.255.0 broadcast 0.0.0.0 ether 02:42:3a:a6:1d:bb txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.44.108 netmask 255.255.255.0 broadcast 192.168.44.255 inet6 fe80::a00:27ff:fee2:ae0a prefixlen 64 scopeid 0x20<link> ether 08:00:27:e2:ae:0a txqueuelen 1000 (Ethernet) RX packets 20866 bytes 2478600 (2.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 21990 bytes 13812121 (13.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 flannel0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1472 inet 10.1.19.0 netmask 255.255.0.0 destination 10.1.19.0 unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC) RX packets 14 bytes 1176 (1.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14 bytes 1176 (1.1 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
然後執行下面命令監聽從flannel0出去的任何的包
tcpdump -i flannel0 -nn host 10.1.19.0
同時再找個視窗ping pod,這是收到的資訊是 ping 10.1.28.3
[[email protected] ~]# tcpdump -i flannel0 -nn host 10.1.19.0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on flannel0, link-type RAW (Raw IP), capture size 65535 bytes 16:28:43.961488 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 1, length 64 16:28:43.963340 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 1, length 64 16:28:44.962567 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 2, length 64 16:28:44.963339 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 2, length 64 16:28:45.966388 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 3, length 64 16:28:45.966962 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 3, length 64 16:28:46.967629 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4520, seq 4, length 64 16:28:46.968486 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4520, seq 4, length 64
可以看到報文已經發出,然後看傳送端的物理網絡卡enp0s3,繼續執行ping命令,然後看有沒有轉發到物理網絡卡的包
因為是master節點,所以有很多8080,443埠發的包,可以忽略,真實環境中相對比較少.核心可以看最後為UDP,length 84,屬於把ping的包64封裝後的包的大小。
[[email protected] ~]# tcpdump -i enp0s3 -nn host 192.168.44.109 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp0s3, link-type EN10MB (Ethernet), capture size 65535 bytes 16:46:59.146611 IP 192.168.44.108.8080 > 192.168.44.109.50060: Flags [P.], seq 1518764712:1518765120, ack 650646529, win 327, options [nop,nop,TS val 4304586 ecr 7794005], length 408 16:46:59.146863 IP 192.168.44.108.443 > 192.168.44.109.51564: Flags [P.], seq 474973224:474973663, ack 3595606551, win 248, options [nop,nop,TS val 4304586 ecr 7794006], length 439 16:46:59.147013 IP 192.168.44.109.50060 > 192.168.44.108.8080: Flags [.], ack 408, win 1424, options [nop,nop,TS val 7794610 ecr 4304586], length 0 16:46:59.147301 IP 192.168.44.109.51564 > 192.168.44.108.443: Flags [.], ack 439, win 1407, options [nop,nop,TS val 7794610 ecr 4304586], length 0 16:46:59.224901 IP 192.168.44.109.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) PTR node1.local. (109) 16:46:59.259598 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 3602262654:3602262700, ack 901869271, win 1407, options [nop,nop,TS val 7794724 ecr 4297197], length 46 16:46:59.267671 IP 192.168.44.108.8285 > 192.168.44.109.8285: UDP, length 84 16:46:59.269133 IP 192.168.44.109.8285 > 192.168.44.108.8285: UDP, length 84 16:46:59.270082 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 1:66, ack 46, win 1432, options [nop,nop,TS val 4304709 ecr 7794724], length 65 16:46:59.270419 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 66:639, ack 46, win 1432, options [nop,nop,TS val 4304709 ecr 7794724], length 573 16:46:59.270734 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [.], ack 66, win 1407, options [nop,nop,TS val 7794735 ecr 4304709], length 0 16:46:59.271040 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [.], ack 639, win 1407, options [nop,nop,TS val 7794735 ecr 4304709], length 0 16:46:59.272370 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 46:94, ack 639, win 1407, options [nop,nop,TS val 7794736 ecr 4304709], length 48 16:46:59.272522 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 94:667, ack 639, win 1407, options [nop,nop,TS val 7794736 ecr 4304709], length 573 16:46:59.272743 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [P.], seq 667:705, ack 639, win 1407, options [nop,nop,TS val 7794736 ecr 4304709], length 38 16:46:59.278885 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [.], ack 705, win 1432, options [nop,nop,TS val 4304718 ecr 7794736], length 0 16:46:59.283084 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 639:681, ack 705, win 1432, options [nop,nop,TS val 4304722 ecr 7794736], length 42 16:46:59.283224 IP 192.168.44.108.443 > 192.168.44.109.34266: Flags [P.], seq 681:723, ack 705, win 1432, options [nop,nop,TS val 4304722 ecr 7794736], length 42 16:46:59.284143 IP 192.168.44.109.34266 > 192.168.44.108.443: Flags [.], ack 723, win 1407, options [nop,nop,TS val 7794748 ecr 4304722], length 0 16:46:59.287279 IP 192.168.44.108.8080 > 192.168.44.109.50060: Flags [P.], seq 408:824, ack 1, win 327, options [nop,nop,TS val 4304726 ecr 7794610], length 416 16:46:59.287584 IP 192.168.44.109.50060 > 192.168.44.108.8080: Flags [.], ack 824, win 1424, options [nop,nop,TS val 7794751 ecr 4304726], length 0
命令確認ping命令的包發到192.168.44.109
目標段網路
再去node1目標端,看物理網絡卡的收包情況,源端繼續執行ping
[[email protected] flannel]# tcpdump -i enp0s3 -nn host 192.168.44.108 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp0s3, link-type EN10MB (Ethernet), capture size 65535 bytes 16:49:04.022476 IP 192.168.44.108.8285 > 192.168.44.109.8285: UDP, length 84 16:49:04.022827 IP 192.168.44.109.8285 > 192.168.44.108.8285: UDP, length 84 16:49:05.022980 IP 192.168.44.108.8285 > 192.168.44.109.8285: UDP, length 84 16:49:05.023425 IP 192.168.44.109.8285 > 192.168.44.108.8285: UDP, length 84 16:49:05.273652 IP 192.168.44.108.8080 > 192.168.44.109.50060: Flags [P.], seq 1518824053:1518824479, ack 650646776, win 336, options [nop,nop,TS val 4430711 ecr 7919368], length 426 16:49:05.273754 IP 192.168.44.109.50060 > 192.168.44.108.8080: Flags [.], ack 426, win 1424, options [nop,nop,TS val 7920736 ecr 4430711], length 0 16:49:05.273951 IP 192.168.44.108.443 > 192.168.44.109.51564: Flags [P.], seq 475036916:475037373, ack 3595607190, win 248, options [nop,nop,TS val 4430711 ecr 7919369], length 457 16:49:05.274091 IP 192.168.44.109.51564 > 192.168.44.108.443: Flags [.], ack 457, win 1407, options [nop,nop,TS val 7920737 ecr 4430711], length 0
發現源端有包過來,正常
在目標節點node1上執行,10.1.19.0是源端的flannel0地址,正常。
[[email protected] flannel]# tcpdump -i flannel0 -nn host 10.1.19.0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on flannel0, link-type RAW (Raw IP), capture size 65535 bytes 16:51:49.795788 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 1, length 64 16:51:49.795911 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 1, length 64 16:51:50.797484 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 2, length 64 16:51:50.797566 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 2, length 64 16:51:51.796934 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 3, length 64 16:51:51.797024 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 3, length 64 16:51:52.800567 IP 10.1.19.0 > 10.1.28.3: ICMP echo request, id 4797, seq 4, length 64 16:51:52.800641 IP 10.1.28.3 > 10.1.19.0: ICMP echo reply, id 4797, seq 4, length 64
最後看目標端docker0有沒有報文,28.3目標pod地址
tcpdump -i docker0 -nn host 10.1.28.3
問題定位
遇到的問題是目標端flannel0上有包發過來,但docker0網段沒有任何包。
所以定位是目標段的flannel0->docker0的轉發出了問題。
通過iptables -nvL 檢視現有的iptables規則,發現
chain FORWARD鏈路 policy是DROP,以下命令修改
iptables -P FORWARD ACCEPT
另外查宿主機的 ip forward是否有問題
sysctl -a | grep ip_forward
源端如何找到目標端地址
全靠flannel會找etcd的中的資料,然後進行路由
華麗的分割線
====================================================================================
服務啟動順序
Kubernetes啟動這些服務的順序非常重要
先是flannel Service
flannel服務啟動時主要做了以下幾步的工作:
- 從etcd中獲取network的配置資訊
- 劃分subnet,並在etcd中進行註冊
- 將子網資訊記錄到
/run/flannel/subnet.env
中
cat /run/flannel/subnet.env FLANNEL_NETWORK=10.0.0.0/16 FLANNEL_SUBNET=10.0.53.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=false
- 之後將會有一個指令碼將subnet.env轉寫成一個docker的環境變數檔案
/run/flannel/docker
cat /run/flannel/docker DOCKER_OPT_BIP="--bip=10.0.53.1/24" DOCKER_OPT_IPMASQ="--ip-masq=true" DOCKER_OPT_MTU="--mtu=1450" DOCKER_NETWORK_OPTIONS=" --bip=10.0.53.1/24 --ip-masq=true --mtu=1450 "
然後是Docker服務
Docker服務會根據flannel拿到的網段然後把pod啟動在這些網段,這樣Kubernetes在定址pod的時候就會找到相應的宿主機,進行通訊。
如果Docker服務和Flannel服務沒有這種關聯關係的化,很可能Docker先用原來的ip段啟動,而這個段並沒有寫到etcd中,導致定址失敗。
這就是在另一次定位問題的出錯點。
==============================================================
驗證是否開牆開通
nc -u 10.93.0.131 (host B) 8472
輸入字元
再host B上,通過tcpdump -i eth0 -nn host hostA來驗證是否能收到報文