1. 程式人生 > >k8s calico網路排錯

k8s calico網路排錯

本地三個節點搭k8s,結果前兩個節點的pod互通,第三個節點不能與前兩個的pod通訊。

檢視路由,發現第三個節點沒有建立通訊的路由。

hadoop002節點路由詳情,加粗為路由詳情。hadoop003無此路由。

[[email protected] beh]# route

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

default gateway 0.0.0.0 UG 100 0 0 ens192

172.16.31.0 0.0.0.0 255.255.255.0 U 100 0 0 ens192

172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0

172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-f33940ad6bcc

192.168.72.192 0.0.0.0 255.255.255.192 U 0 0 0 *

192.168.72.241 0.0.0.0 255.255.255.255 UH 0 0 0 cali835b424b828

192.168.72.243 0.0.0.0 255.255.255.255 UH 0 0 0 calid14de0a1fe6

192.168.72.244 0.0.0.0 255.255.255.255 UH 0 0 0 calibae9713a5c9

192.168.72.245 0.0.0.0 255.255.255.255 UH 0 0 0 calif15216f38d6

192.168.72.247 0.0.0.0 255.255.255.255 UH 0 0 0 cali07b42699ca8

192.168.72.253 0.0.0.0 255.255.255.255 UH 0 0 0 calied45b975889

192.168.135.128 hadoop001 255.255.255.192 UG 0 0 0 tunl0

[[email protected] beh]# ip route

default via 172.16.31.254 dev ens192 proto static metric 100

172.16.31.0/24 dev ens192 proto kernel scope link src 172.16.31.122 metric 100

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

172.18.0.0/16 dev br-f33940ad6bcc proto kernel scope link src 172.18.0.1

blackhole 192.168.72.192/26 proto bird

192.168.72.241 dev cali835b424b828 scope link

192.168.72.243 dev calid14de0a1fe6 scope link

192.168.72.244 dev calibae9713a5c9 scope link

192.168.72.245 dev calif15216f38d6 scope link

192.168.72.247 dev cali07b42699ca8 scope link

192.168.72.253 dev calied45b975889 scope link

192.168.135.128/26 via 172.16.31.121 dev tunl0 proto bird onlink

想手動新增下面兩條路由,均沒有成功。

ip route add 172.16.31.121/23 dev tunl0

route add -net 192.168.135.128 gw hadoop001 metric 0 netmask 255.255.255.192 dev tunl0

刪除calico etcd資料後,重置k8s,路由資訊全部消失。

檢視calico-node日誌,發現報錯

bird: BGP: Unexpected connect from unknown address

重置了好幾遍,結果所有節點都不通了,沒辦法動用calicoctl。

對比hadoop001叢集和dlw1叢集,dlw1狀態正常。發現了一些異常,hadoop001出現的是172.18.0.1這類奇怪的ip,不是實際主機ip,進一步檢視calico-node的日誌發現更多線索。

[[email protected] beh]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calicoctl node status

Calico process is running.

IPv4 BGP status

+--------------+-------------------+-------+----------+---------+

| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |

+--------------+-------------------+-------+----------+---------+

| 172.18.0.1 | node-to-node mesh | start | 07:16:12 | Connect |

| 172.19.0.1 | node-to-node mesh | start | 07:16:12 | Connect |

+--------------+-------------------+-------+----------+---------+

IPv6 BGP status

No IPv6 peers found.

-----------------------------分割線--------------------------------

[[email protected] tbc]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calicoctl node status

Calico process is running.

IPv4 BGP status

+--------------+-------------------+-------+------------+-------------+

| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |

+--------------+-------------------+-------+------------+-------------+

| 172.16.40.2 | node-to-node mesh | up | 2018-11-03 | Established |

| 172.16.40.3 | node-to-node mesh | up | 2018-11-03 | Established |

+--------------+-------------------+-------+------------+-------------+

IPv6 BGP status

No IPv6 peers found.

hadoop002日誌也發現了這個現象,

2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop002" ipv4_addr:"172.18.0.1"

2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop003" ipv4_addr:"172.19.0.1"

2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop001" ipv4_addr:"172.16.31.121"

dlw2的日誌則顯示是主機ip

18-11-03 02:51:33.907 [INFO][197] syncer.go 473: Started receiving snapshot snapshotIndex=0x19a8

2018-11-03 02:51:33.908 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw1" ipv4_addr:"172.16.40.1"

2018-11-03 02:51:33.919 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw2" ipv4_addr:"172.16.40.2"

2018-11-03 02:51:33.919 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw3" ipv4_addr:"172.16.40.3"

參考這邊文章,在calico-node的yaml檔案中配置ip查詢策略,定義自動查詢且指定網路介面,重啟node後網路打通。

- name: IP

value: "autodetect"

- name: IP_AUTODETECTION_METHOD

value: "interface=ens192"

[[email protected] beh]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calico

ifctl node status

Calico process is running.

IPv4 BGP status

+---------------+-------------------+-------+----------+-------------+

| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |

+---------------+-------------------+-------+----------+-------------+

| 172.16.31.122 | node-to-node mesh | up | 09:51:28 | Established |

| 172.16.31.123 | node-to-node mesh | up | 09:51:28 | Established |

+---------------+-------------------+-------+----------+-------------+

IPv6 BGP status

No IPv6 peers found.