1. 程式人生 > >從CNI到OVN

從CNI到OVN

kubernetes各版本離線安裝包

諸如calico flannel等CNI實現,通過犧牲一些功能讓網路複雜度得以大幅度降低是我極其推崇的,在雲原生時代應用不再關心基礎設施的場景下是一個明智之舉,給網路調錯帶來了極大方便。

openstack與k8s放一起比較意義不大,openstack還是著重與基礎設施,所以對上介面還是機器設施,網路設施,儲存設施等,著重與資源的抽象。

然鵝k8s不僅需要資源抽象,還需要關心應用的管理,其基於容器的設計理念已經改變了傳統三層的雲端計算架構,而更像一個雲核心,對上不再關心基礎設施的介面了,反正把使用者應用管好了就行。

對比早起的作業系統很發現歷史是驚人的相似,早期分層式作業系統到現代的巨集核心與微核心作業系統,系統設計更為內聚了。目測雲作業系統也會朝著這個路子發展吧(openstack粉太多,亡openstack之心不死不敢直說)

但是!

openstack底層一些技術還是非常值得學習與應用的,如qemu kvm ovs ovn ceph DPDK等。。。

本文重點講網路這塊,ovn ovs怎麼與kubernetes擦出火花

<!--more-->

CNI原理簡述

CNI不是本文的重點,這裡僅做一下簡單的介紹更多詳情

CNI很簡單,本質就是你實現一個命令列工具,kubelet初始化網路時會去呼叫這個工具,傳入一些環境變數,然後根據環境變數工具去做網路配置:

配置完成後標準輸出一個CNI規定的json格式,告訴k8s你的IP地址啥的

命令包含三個部分

  • ADD 建立網路
  • DEL 刪除網路
  • CHECK 檢查網路

這裡對ADD做一個介紹:

EnvCNIPath        = "CNI_PATH"
EnvNetDir         = "NETCONFPATH"
EnvCapabilityArgs = "CAP_ARGS"
EnvCNIArgs        = "CNI_ARGS"
EnvCNIIfname      = "CNI_IFNAME" # 網絡卡名

DefaultNetDir = "/etc/cni/net.d"

CmdAdd   = "add"
CmdCheck = "check"
CmdDel   = "del"

入參:

容器ID
網路namespace目錄
網路配置 - 定義哪些容器可以join到此網路
容器內網絡卡名
額外引數

標準輸出類似這樣一個json:

{
  "cniVersion": "0.4.0",
  "interfaces": [                                            (this key omitted by IPAM plugins)
      {
          "name": "<name>",
          "mac": "<MAC address>",                            (required if L2 addresses are meaningful)
          "sandbox": "<netns path or hypervisor identifier>" (required for container/hypervisor interfaces, empty/omitted for host interfaces)
      }
  ],
  "ips": [
      {
          "version": "<4-or-6>",
          "address": "<ip-and-prefix-in-CIDR>",
          "gateway": "<ip-address-of-the-gateway>",          (optional)
          "interface": <numeric index into 'interfaces' list>
      }
...

那比如想拿到pod的一些元資料怎麼辦,典型場景是比如pod yaml裡定義了屬於哪個子網啥的,對不起CNI不傳給你,你得拿著podid去apiserver裡查,這是一個非常不爽的地方,所以現在ovn的CNI都有一個CNI server的東西去和apiserver互動。

我去實現的話會考慮把資訊寫到容器的label裡,這樣CNI工具直接去容器元資料裡查詢一些資訊,少用一個server

OVS與OVN安裝與配置

編譯安裝

(吐槽一下ovn寫的shit一般的文件)

推薦用原始碼安裝地址

wget https://www.openvswitch.org/releases/openvswitch-2.11.1.tar.gz
tar zxvf openvswitch-2.11.1.tar.gz
cd openvswitch-2.11.1
./boot.sh && ./configure && make && make install

有個ovn的sandbox 可以這樣make : make sandbox SANDBOXFLAGS="--ovn" 太低階咱不玩

如果編譯核心模組:

$ make modules_install
$ config_file="/etc/depmod.d/openvswitch.conf"
$ for module in datapath/linux/*.ko; do
  modname="$(basename ${module})"
  echo "override ${modname%.ko} * extra" >> "$config_file"
  echo "override ${modname%.ko} * weak-updates" >> "$config_file"
  done
$ depmod -a
$ /sbin/modprobe openvswitch
$ /sbin/lsmod | grep openvswitch

啟動ovs

$ export PATH=$PATH:/usr/local/share/openvswitch/scripts
$ ovs-ctl start --system-id="random"
$ ovs-appctl -t ovsdb-server ovsdb-server/add-remote ptcp:6640:IP_ADDRESS # 開啟遠端資料庫

IP_ADDRESS 是控制節點管理網地址

驗證ovs

$ ovs-vsctl add-br br0
$ ovs-vsctl add-port br0 eth0
$ ovs-vsctl add-port br0 vif1.0
$ ovs-vsctl show

啟動ovn

$ /usr/share/openvswitch/scripts/ovn-ctl start_northd # 啟動北向資料庫
$ /usr/share/openvswitch/scripts/ovn-ctl start_controller # 啟動ovn controller
$ ovn-sbctl show # 驗證
$ ovn-nbctl show # 驗證

配置ovs與ovn相連線

# ovn-nbctl set-connection ptcp:6641:0.0.0.0 -- \
            set connection . inactivity_probe=60000
# ovn-sbctl set-connection ptcp:6642:0.0.0.0 -- \
            set connection . inactivity_probe=60000
# if using the VTEP functionality:
#   ovs-appctl -t ovsdb-server ovsdb-server/add-remote ptcp:6640:0.0.0.0

配置ovsdb-server模組,預設ovsdb-server只允許本地訪問,ovn服務需要這個許可權。

配置ovs

controller節點使用ovs databases

ovs-vsctl set open . external-ids:ovn-remote=tcp:IP_ADDRESS:6642
ovs-vsctl set open . external-ids:ovn-encap-type=geneve,vxlan # 配置封裝型別,geneve比較吊
ovs-vsctl set open . external-ids:ovn-encap-ip=IP_ADDRESS # 配置overlay endpoint地址

OVS與容器

ovs單機連通性

建立容器, 設定net=none可以防止docker0預設網橋影響連通性測試

docker run -itd --name con6 --net=none ubuntu:14.04 /bin/bash
docker run -itd --name con7 --net=none ubuntu:14.04 /bin/bash
docker run -itd --name con8 --net=none ubuntu:14.04 /bin/bash

建立網橋

ovs-vsctl add-br ovs0

使用ovs-docker給容器新增網絡卡,並掛到ovs0網橋上

ovs-docker add-port ovs0 eth0 con6 --ipaddress=192.168.1.2/24
ovs-docker add-port ovs0 eth0 con7 --ipaddress=192.168.1.3/24
ovs-docker add-port ovs0 eth0 con8 --ipaddress=192.168.1.4/24

檢視網橋

[root@controller /]# ovs-vsctl show
21e4d4c5-cadd-4dac-b025-c20b8108ad09
    Bridge "ovs0"
        Port "b167e3dcf8db4_l"
            Interface "b167e3dcf8db4_l"
        Port "f1c0a9d0994d4_l"
            Interface "f1c0a9d0994d4_l"
        Port "121c6b2f221c4_l"
            Interface "121c6b2f221c4_l"
        Port "ovs0"
            Interface "ovs0"
                type: internal
    ovs_version: "2.8.2"

測試連通性

[root@controller /]# docker exec -it con8 sh
# ping 192.168.1.2      
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.886 ms
^C
--- 192.168.1.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.886/0.886/0.886/0.000 ms
# 
# ping 192.168.1.3  
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.712 ms
^C
--- 192.168.1.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.712/0.712/0.712/0.000 ms
# 

設定VLAN tag

檢視網橋

[root@controller /]# ovs-vsctl show
21e4d4c5-cadd-4dac-b025-c20b8108ad09
    Bridge "ovs0"
        Port "b167e3dcf8db4_l"
            Interface "b167e3dcf8db4_l"
        Port "f1c0a9d0994d4_l"
            Interface "f1c0a9d0994d4_l"
        Port "121c6b2f221c4_l"
            Interface "121c6b2f221c4_l"
        Port "ovs0"
            Interface "ovs0"
                type: internal
    ovs_version: "2.8.2"

Interface是openvswitch核心概念之一,對應模擬的是交換機中插入port的網絡卡裝置。一個Port通常只能有一個interface,但也可以有多個interfaces(Bond).

interface type

  • system(如eth0),比如想把系統上的網絡卡掛在網橋上
  • internal(模擬網路裝置,名字如果是和bridge的名字一樣則叫local interface)
  • tap(一個tun/tap裝置)
  • patch(一對虛擬裝置,用來模擬插線電纜) 容器場景用的多
  • geneve(乙太網通過geneve隧道)
  • gre(RFC2890),ipsec_gre(RFC2890 over ipsec tunnel)
  • vxlan(基於以UDP為基礎的VXLAN協議上的乙太網隧道)
  • lisp(一個3層的隧道,還在實驗階段)
  • stt(Stateless TCP Tunnel,)

檢視interface

[root@controller /]# ovs-vsctl list interface f1c0a9d0994d4_l
_uuid               : cf400e7c-d2d6-4e0a-ad02-663dd63d1751
admin_state         : up
duplex              : full
error               : []
external_ids        : {container_id="con6", container_iface="eth0"}
ifindex             : 239
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current        : []
link_resets         : 1
link_speed          : 10000000000
link_state          : up
mac_in_use          : "96:91:0a:c9:02:d6"
mtu                 : 1500
mtu_request         : []
name                : "f1c0a9d0994d4_l"
ofport              : 3
other_config        : {}
statistics          : {collisions=0, rx_bytes=1328, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=18, tx_bytes=3032, tx_dropped=0, tx_errors=0, tx_packets=40}
status              : {driver_name=veth, driver_version="1.0", firmware_version=""}
type                : ""

設定vlan tag

ovs-vsctl set port   f1c0a9d0994d4_l tag=100  //con6
ovs-vsctl set port   b167e3dcf8db4_l tag=100  //con8
ovs-vsctl set port   121c6b2f221c4_l tag=200  //con7

測試連通性

[root@controller /]# docker exec -it con8 sh
# 
# ping 192.168.1.2 -c 3
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.413 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.057 ms
--- 192.168.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2044ms
rtt min/avg/max/mdev = 0.057/0.177/0.413/0.166 ms
# 
# ping 192.168.1.3 -c 3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
From 192.168.1.4 icmp_seq=1 Destination Host Unreachable
From 192.168.1.4 icmp_seq=2 Destination Host Unreachable
--- 192.168.1.3 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2068ms
pipe 3
# 

跨主機連通性

環境

host1 172.29.101.123
網橋:  ovs0      

容器:    
con6  192.168.1.2   
con7  192.168.1.3   
con8  192.168.1.4   

建立方式依上

host2 172.29.101.82
網橋: ovs1

容器: con11

準備環境

建立網橋
ovs-vsctl add-br ovs1

建立容器
docker run -itd --name con11 --net=none ubuntu:14.04 /bin/bash

掛到ovs0網橋
ovs-docker add-port ovs1 eth0 con11 --ipaddress=192.168.1.6/24

檢視網橋ovs1

[root@compute82 /]# ovs-vsctl show
380ce027-8edf-4844-8e89-a6b9c1adaff3
    Bridge "ovs1"
        Port "0384251973e64_l"
            Interface "0384251973e64_l"
        Port "ovs1"
            Interface "ovs1"
                type: internal
    ovs_version: "2.8.2"

設定vxlan

在host1上

[root@controller /]# ovs-vsctl add-port ovs0 vxlan1 -- set interface vxlan1 type=vxlan options:remote_ip=172.29.101.82 options:key=flow
[root@controller /]# 
[root@controller /]# ovs-vsctl show
21e4d4c5-cadd-4dac-b025-c20b8108ad09
    Bridge "ovs0"
        Port "b167e3dcf8db4_l"
            tag: 100
            Interface "b167e3dcf8db4_l"
        Port "f1c0a9d0994d4_l"
            tag: 100
            Interface "f1c0a9d0994d4_l"
        Port "121c6b2f221c4_l"
            tag: 200
            Interface "121c6b2f221c4_l"
        Port "ovs0"
            Interface "ovs0"
                type: internal
        Port "vxlan1"
            Interface "vxlan1"
                type: vxlan
                options: {key=flow, remote_ip="172.29.101.82"}
    ovs_version: "2.8.2"

在host2上

[root@compute82 /]# ovs-vsctl add-port ovs1 vxlan1 -- set interface vxlan1 type=vxlan options:remote_ip=172.29.101.123 options:key=flow
[root@compute82 /]# 
[root@compute82 /]# ovs-vsctl show
380ce027-8edf-4844-8e89-a6b9c1adaff3
    Bridge "ovs1"
        Port "0384251973e64_l"
            Interface "0384251973e64_l"
        Port "vxlan1"
            Interface "vxlan1"
                type: vxlan
                options: {key=flow, remote_ip="172.29.101.123"}
        Port "ovs1"
            Interface "ovs1"
                type: internal
    ovs_version: "2.8.2"

設定vlan tag

ovs-vsctl set port 0384251973e64_l tag=100

連通性測試

[root@compute82 /]# docker exec -ti con11 bash
root@c82da61bf925:/# ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.161 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.206 ms
^C
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
root@c82da61bf925:/# 
root@c82da61bf925:/# ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
^C
--- 192.168.1.3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2027ms
root@c82da61bf925:/# 
root@c82da61bf925:/# exit

結論

vxlan只能連通兩臺機器的ovs上同一個網段的容器,無法連通ovs上不同網段的容器。如果需要連通不同網段的容器,接下來我們嘗試通過ovs的流表來解決這個問題。

OpenFlow

flow table

支援openflow的交換機中可能包含多個flow table。每個flow table包含多條規則,每條規則包含匹配條件和執行動作。flow table中的每條規則有優先順序,優先順序高的優先匹配,匹配到規則以後,執行action,如果匹配失敗,按優先順序高低,繼續匹配下一條。如果都不匹配,每張表會有預設的動作,一般為drop或者轉給下一張流表。

實踐

環境

host1 172.29.101.123

網橋:  ovs0      

容器:    
con6  192.168.1.2     tag=100
con7  192.168.1.3     tag=100

host2 172.29.101.82

網橋: ovs1

容器:  
con9:  192.168.2.2    tag=100
con10:192.168.2.3    tag=100
con11: 192.168.1.5    tag=100

檢視預設流表

在host1上檢視預設流表

[root@controller msxu]# ovs-ofctl dump-flows ovs0
 cookie=0x0, duration=27858.050s, table=0, n_packets=5253660876, n_bytes=371729202788, priority=0 actions=NORMAL

在容器con6中ping con7,網路連通

[root@controller /]# docker exec -ti con6 bash
root@9ccc5c5664f9:/# 
root@9ccc5c5664f9:/# ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.613 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.066 ms
--- 192.168.1.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1058ms
rtt min/avg/max/mdev = 0.066/0.339/0.613/0.274 ms
root@9ccc5c5664f9:/# 

刪除預設流表

[root@controller /]# ovs-ofctl del-flows ovs0
[root@controller /]# 
[root@controller /]# ovs-ofctl dump-flows ovs0
[root@controller /]# 

測試網路連通性,發現網路已經不通

[root@controller /]# docker exec -ti con6 bash
root@9ccc5c5664f9:/# 
root@9ccc5c5664f9:/# ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
^C
--- 192.168.1.3 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1025ms
root@9ccc5c5664f9:/# 

新增流表

如果要con6和con7能夠通訊,需要建立規則,讓ovs轉發對應的資料

檢視con6和con7在ovs上的網路埠

[root@controller /]# ovs-vsctl show
21e4d4c5-cadd-4dac-b025-c20b8108ad09
    Bridge "ovs0"
        Port "f1c0a9d0994d4_l"
            tag: 100
            Interface "f1c0a9d0994d4_l"
        Port "121c6b2f221c4_l"
            tag: 100
            Interface "121c6b2f221c4_l"
        Port "ovs0"
            Interface "ovs0"
                type: internal
        Port "vxlan1"
            Interface "vxlan1"
                type: vxlan
                options: {key=flow, remote_ip="172.29.101.82"}
    ovs_version: "2.8.2"
[root@controller /]# ovs-vsctl list interface f1c0a9d0994d4_l |grep ofport
ofport              : 3
ofport_request      : []
[root@controller /]# 
[root@controller /]# ovs-vsctl list interface 121c6b2f221c4_l |grep ofport
ofport              : 4
ofport_request      : []

新增規則:

[root@controller /]#ovs-ofctl add-flow ovs0 "priority=1,in_port=3,actions=output:4"
[root@controller /]#ovs-ofctl add-flow ovs0 "priority=2,in_port=4,actions=output:3"
[root@controller /]# ovs-ofctl dump-flows ovs0
 cookie=0x0, duration=60.440s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="f1c0a9d0994d4_l" actions=output:"121c6b2f221c4_l"
 cookie=0x0, duration=50.791s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="121c6b2f221c4_l" actions=output:"f1c0a9d0994d4_l"
[root@controller /]#

測試連通性:con6和con7已通

[root@controller msxu]# docker exec -ti con6 bash
root@9ccc5c5664f9:/# ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.924 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.058 ms
^C
--- 192.168.1.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1057ms
rtt min/avg/max/mdev = 0.058/0.491/0.924/0.433 ms
root@9ccc5c5664f9:/# 

設定一條優先順序高的規則:

[root@controller /]# ovs-ofctl add-flow ovs0 "priority=2,in_port=4,actions=drop"
[root@controller /]# 
[root@controller /]# docker exec -ti con6 bash
root@9ccc5c5664f9:/# 
root@9ccc5c5664f9:/# ping  192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
^C
--- 192.168.1.3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2087ms
root@9ccc5c5664f9:/# 
root@9ccc5c5664f9:/# 

流表中的規則是有優先順序的,priority數值越大,優先順序越高。流表中,優先順序高的優先匹配,並執行匹配規則的actions。如果不匹配,繼續匹配優先順序低的下一條。

跨網段連通

在上一個vxlan的實踐中,通過設定vxlan可以打通兩個機器上的ovs,但我們提到兩個機器ovs上的容器得在同一個網段上才能通訊。

在ip為192.168.2.2的con9上ping另一臺機上的con6 192.168.1.2

[root@compute82 /]# docker exec -ti con9 bash
root@b55602aad0ac:/# 
root@b55602aad0ac:/# ping 192.168.1.2
connect: Network is unreachable
root@b55602aad0ac:/# 

新增流表規則:

在host1上:

[root@controller /]# ovs-ofctl add-flow ovs0 "priority=4,in_port=6,actions=output:3"
[root@controller /]# 
[root@controller /]# ovs-ofctl add-flow ovs0 "priority=4,in_port=3,actions=output:6"
[root@controller /]# ovs-ofctl dump-flows ovs0
 cookie=0x0, duration=3228.737s, table=0, n_packets=7, n_bytes=490, priority=1,in_port="f1c0a9d0994d4_l" actions=output:"121c6b2f221c4_l"
 cookie=0x0, duration=3215.544s, table=0, n_packets=0, n_bytes=0, priority=1,in_port="121c6b2f221c4_l" actions=output:"f1c0a9d0994d4_l"
 cookie=0x0, duration=3168.297s, table=0, n_packets=9, n_bytes=546, priority=2,in_port="121c6b2f221c4_l" actions=drop
 cookie=0x0, duration=12.024s, table=0, n_packets=0, n_bytes=0, priority=4,in_port=vxlan1 actions=output:"f1c0a9d0994d4_l"
 cookie=0x0, duration=3.168s, table=0, n_packets=0, n_bytes=0, priority=4,in_port="f1c0a9d0994d4_l" actions=output:vxlan1

在host2上

[root@compute82 /]# ovs-ofctl add-flow ovs1 "priority=1,in_port=1,actions=output:6"
[root@compute82 /]# 
[root@compute82 /]# ovs-ofctl add-flow ovs1 "priority=1,in_port=6,actions=output:1"
[root@compute82 /]# ovs-ofctl dump-flows ovs1
 cookie=0x0, duration=1076.522s, table=0, n_packets=27, n_bytes=1134, priority=1,in_port="0384251973e64_l" actions=output:vxlan1
 cookie=0x0, duration=936.403s, table=0, n_packets=0, n_bytes=0, priority=1,in_port=vxlan1 actions=output:"0384251973e64_l"
 cookie=0x0, duration=70205.443s, table=0, n_packets=7325, n_bytes=740137, priority=0 actions=NORMAL

測試連通性

在host2 con9上ping 192.168.1.2

[root@compute82 /]# docker exec -ti con9 bash
root@b55602aad0ac:/# 
root@b55602aad0ac:/# ping 192.168.1.2
connect: Network is unreachable
root@b55602aad0ac:/# 

發現網路並不通,檢視發現路由規則有問題,新增預設路由規則,注意這裡需要已privileged許可權進入容器

[root@compute82 /]# docker exec --privileged -ti con9 bash
root@b55602aad0ac:/# 
root@b55602aad0ac:/# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
root@b55602aad0ac:/# route add default dev eth0
root@b55602aad0ac:/# 
root@b55602aad0ac:/# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         0.0.0.0         0.0.0.0         U     0      0        0 eth0
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
root@b55602aad0ac:/# 

在host1和host2的容器中都新增好路由規則後,測試連通性

[root@compute82 /]# docker exec --privileged -ti con9 bash
root@b55602aad0ac:/# 
root@b55602aad0ac:/# ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=1.16 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.314 ms
^C
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.314/0.739/1.165/0.426 ms

已成功通過ovs,vxlan打通兩臺機器上不同網段容器

OVN實踐

有了ovs相關的實踐,就具備了一定的基礎,下面就可以進一步去了解ovn,ovn很重要的一點就是理解邏輯交換機,ovn是管控層面的,比如每臺機器上都起了一個ovs交換機(軟交換機,或者相對於邏輯交換機稱之為物理交換機) 分佈在不同機器上的虛擬機器想要在一個子網下,那麼我們建立一個邏輯交換機,把機器interface與之邏輯上關聯在一起即可,最終ovn會下發流表使其在一個子網下。

基本使用

邏輯面(控制面)

建立倆邏輯交換機

$ ovn-nbctl ls-add sw0
$ ovn-nbctl lsp-add sw0 sw0-port1
$ ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"

$ ovn-nbctl ls-add sw1
$ ovn-nbctl lsp-add sw1 sw1-port1
$ ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"

建立一個邏輯路由器,並把兩個交換機連線到路由器上

$ ovn-nbctl lr-add lr0
$ ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
$ ovn-nbctl lsp-add sw0 lrp0-attachment
$ ovn-nbctl lsp-set-type lrp0-attachment router
$ ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
$ ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
$ ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24

$ ovn-nbctl lsp-add sw1 lrp1-attachment
$ ovn-nbctl lsp-set-type lrp1-attachment router
$ ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
$ ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1

檢視邏輯配置:

$ ovn-nbctl show
    switch 1396cf55-d176-4082-9a55-1c06cef626e4 (sw1)
        port lrp1-attachment
            addresses: ["00:00:00:00:ff:02"]
        port sw1-port1
            addresses: ["50:54:00:00:00:03 11.0.0.2"]
    switch 2c9d6d03-09fc-4e32-8da6-305f129b0d53 (sw0)
        port lrp0-attachment
            addresses: ["00:00:00:00:ff:01"]
        port sw0-port1
            addresses: ["50:54:00:00:00:01 192.168.0.2"]
    router f8377e8c-f75e-4fc8-8751-f3ea03c6dd98 (lr0)
        port lrp0
            mac: "00:00:00:00:ff:01"
            networks: ["192.168.0.1/24"]
        port lrp1
            mac: "00:00:00:00:ff:02"
            networks: ["11.0.0.1/24"]

使用ovn-trace:

$ ovn-trace --minimal sw0 'inport == "sw0-port1" \
> && eth.src == 50:54:00:00:00:01 && ip4.src == 192.168.0.2 \
> && eth.dst == 00:00:00:00:ff:01 && ip4.dst == 11.0.0.2 \
> && ip.ttl == 64'

# ip,reg14=0x1,vlan_tci=0x0000,dl_src=50:54:00:00:00:01,dl_dst=00:00:00:00:ff:01,nw_src=192.168.0.2,nw_dst=11.0.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=64
ip.ttl--;
eth.src = 00:00:00:00:ff:02;
eth.dst = 50:54:00:00:00:03;
output("sw1-port1");

這裡我們指定了源地址與源埠,再指定目的ip,最後會輸出告訴我們從交換機哪個埠發出去了。

重點: 把容器掛到邏輯交換機上

ovs-docker這個工具裡有這樣一句:

ip link add "${PORTNAME}_l" type veth peer name "${PORTNAME}_c"

# Add one end of veth to OVS bridge.
if ovs_vsctl --may-exist add-port "$BRIDGE" "${PORTNAME}_l" \
       -- set interface "${PORTNAME}_l" \

先建立了一個裝置對,然後把裝置對一端設定成ovs上的一個interface, 這樣容器與ovs就關聯上了,再把這個ovs上的port與ovn邏輯子網進行關聯即可,請看具體例子:

啟動容器後是先要把容器裝置對的一端掛在物理交換機上,然後通過設定iface-id來與邏輯交換機進行關聯。

ovs-vsctl --may-exist add-port sw0 port0 -- set interface port0 # 把docker掛到ovs上
ovs-vsctl set Interface port0 external_ids:iface-id=lpor0       # 通過iface-id關聯到邏輯埠上

具體程式碼可以檢視這裡 這封裝了一些基操作

一些具體實現:使用教程

邏輯子網

這裡建立四個埠,都掛在ovs br-int網橋上,但是分別屬於不同的邏輯交換機,這樣不同的邏輯交換機沒有連線路由器的情況下是不通的,同一個邏輯子網下埠可以互通。

ls-create sw0
ls-add-port sw0 sw0-port1 00:00:00:00:00:01 192.168.33.10/24
ls-add-port sw0 sw0-port2 00:00:00:00:00:02 192.168.33.20/24

ls-create sw1
ls-add-port sw1 sw1-port1 00:00:00:00:00:03 192.168.33.30/24
ls-add-port sw1 sw1-port2 00:00:00:00:00:04 192.168.33.40/24

ovs-add-port br-int lport1 sw0-port1 192.168.33.1
ovs-add-port br-int lport2 sw0-port2 192.168.33.1
ovs-add-port br-int lport3 sw1-port1 192.168.33.1
ovs-add-port br-int lport4 sw1-port2 192.168.33.1

ip netns exec lport1-ns ip addr
ip netns exec lport2-ns ip addr
ip netns exec lport3-ns ip addr
ip netns exec lport4-ns ip addr

ip netns exec lport1-ns ping -c3 192.168.33.20
ofport=$(ovs-vsctl list interface lport1 | awk '/ofport /{print $3}')
ovs-appctl ofproto/trace br-int in_port=$ofport,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02 -generate

ip netns exec lport1-ns ping -c3 192.168.33.30
ovs-appctl ofproto/trace br-int in_port=$ofport,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:03 -generate

這裡ls-create ls-add-port和ovs-add-port是簡單封裝了一下命令:

ls-create ls-add-port:

ls-create() {
    ovn-nbctl --may-exist ls-add $switch
}

ls-add-port() {
    switch=$1
    port=$2
    mac=$3
    cidr=$4

    # 邏輯交換機上建立邏輯埠
    ovn-nbctl --may-exist lsp-add $switch $port

    # 給邏輯埠設定mac地址
    ovn-nbctl lsp-set-addresses $port $mac

    # 僅允許該埠源或目的mac為對應地址
    ovn-nbctl lsp-set-port-security $port $mac $cidr
}

穿件網路ns,把interface塞到ns中,再與物理埠相關聯,然後給interface配置IP

ovs-add-port() {
    bridge=$1
    port=$2
    lport=$3
    gateway=$4

    # 建立一個網路ns
    ip netns add $port-ns
    # set interface 很重要,要不然就會只有埠沒有interface,所以無法把它塞到ns中
    ovs-vsctl --may-exist add-port $bridge $port -- set interface $port type=internal
    if [ ! -z "$lport" ]; then
        # 把邏輯埠與ovs埠進行關聯
        ovs-vsctl set Interface $port external_ids:iface-id=$lport
    fi

    pscount=$(ovn-nbctl lsp-get-port-security $lport | wc -l)
    if [ $pscount = 2 ]; then
        mac=$(ovn-nbctl lsp-get-port-security $lport | head -n 1)
        cidr=$(ovn-nbctl lsp-get-port-security $lport | tail -n 1)
        ip link set $port netns $port-ns
        # ip netns exec $port-ns ip link set dev $port name eth0
        ip netns exec $port-ns ip link set $port address $mac
        ip netns exec $port-ns ip addr add $cidr dev $port
        ip netns exec $port-ns ip link set $port up
        if [ ! -z "$gateway" ]; then
            ip netns exec $port-ns ip route add default via $gateway
        fi
    fi
}

所以在實現專有網路時只需要建立不同的邏輯交換機即可,不通過路由相連專有網路之間就會相互隔離。

IP管理(DHCP)

靜態IP配置

這裡給ovn邏輯埠配置一個靜態的IP,然後ovn會模擬DHCP協議給埠響應完成地址配置

ovn-nbctl lr-add user1
ovn-nbctl ls-add vpc1

#建立路由連線到vpc1埠,並分配mac 02:ac:10:ff:34:01 IP 172.66.1.10
ovn-nbctl lrp-add user1 user1-vpc1 02:ac:10:ff:34:01 172.66.1.10/24

ovn-nbctl lsp-add vpc1 vpc1-user1
ovn-nbctl lsp-set-type vpc1-user1 router
ovn-nbctl lsp-set-addresses vpc1-user1 02:ac:10:ff:34:01
ovn-nbctl lsp-set-options vpc1-user1 router-port=user1-vpc1

#建立路由連線到vpc2埠,並分配mac 02:ac:10:ff:34:02 IP 172.77.1.10
ovn-nbctl lrp-add user1 user1-vpc2 02:ac:10:ff:34:02 172.77.1.10/24

ovn-nbctl lsp-add vpc1 vpc1-vm1
# 這裡給邏輯埠配置IP地址
ovn-nbctl lsp-set-addresses vpc1-vm1 "02:ac:10:ff:01:30 172.66.1.107" 
# ovn-nbctl lsp-set-port-security vpc1-vm1 "02:ac:10:ff:01:30 172.66.1.101"

options=$(ovn-nbctl create DHCP_Options cidr=172.66.1.0/24 \
options="\"server_id\"=\"172.66.1.10\" \"server_mac\"=\"02:ac:10:ff:34:01\" \
\"lease_time\"=\"3600\" \"router\"=\"172.66.1.10\"")

echo "DHCP options is: " $options
ovn-nbctl lsp-set-dhcpv4-options vpc1-vm1 $options
ovn-nbctl lsp-get-dhcpv4-options vpc1-vm1

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 address 02:ac:10:ff:01:30
ip link set vm1 netns vm1
ovs-vsctl set Interface vm1 external_ids:iface-id=vpc1-vm1
# 通過DHCP即可拿到地址
ip netns exec vm1 dhclient vm1
ip netns exec vm1 ip addr show vm1
ip netns exec vm1 ip route show

clean() {
    ip netns del vm1
    ovn-nbctl ls-del vpc1
    ovs-vsctl del-port br-int vm1
}

動態獲取IP地址

ovn支援管理你的IP地址,只需要指定一個子網,就會給藉口分配未被佔用的IP地址:

大部分操作與靜態IP一樣,注意下面幾個重點註釋地方:

ovn-nbctl lr-add user1
ovn-nbctl ls-add vpc1
# [重點] 需要other_config,否則不會分配地址
ovn-nbctl set Logical_Switch vpc1 other_config:subnet=172.66.1.10/24

#建立路由連線到vpc1埠,並分配mac 02:ac:10:ff:34:01 IP 172.66.1.10
ovn-nbctl lrp-add user1 user1-vpc1 02:ac:10:ff:34:01 172.66.1.10/24

ovn-nbctl lsp-add vpc1 vpc1-user1
ovn-nbctl lsp-set-type vpc1-user1 router
ovn-nbctl lsp-set-addresses vpc1-user1 02:ac:10:ff:34:01
ovn-nbctl lsp-set-options vpc1-user1 router-port=user1-vpc1

ovn-nbctl lsp-add vpc1 vpc1-vm1
# 【重點】這裡不指定具體地址,而使用dynamic
ovn-nbctl lsp-set-addresses vpc1-vm1 "02:ac:10:ff:01:30 dynamic"
# ovn-nbctl lsp-set-addresses vpc1-vm1 "dynamic"
# ovn-nbctl lsp-set-port-security vpc1-vm1 "02:ac:10:ff:01:30 172.66.1.106"

options=$(ovn-nbctl create DHCP_Options cidr=172.66.1.0/24 \
options="\"server_id\"=\"172.66.1.10\" \"server_mac\"=\"02:ac:10:ff:34:01\" \
\"lease_time\"=\"3600\" \"router\"=\"172.66.1.10\"")

echo "DHCP options is: " $options
ovn-nbctl lsp-set-dhcpv4-options vpc1-vm1 $options
ovn-nbctl lsp-get-dhcpv4-options vpc1-vm1
# 這裡可以看到分配到的地址
ovn-nbctl list logical_switch_port

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 address 02:ac:10:ff:01:30
ip link set vm1 netns vm1
ovs-vsctl set Interface vm1 external_ids:iface-id=vpc1-vm1
# 通過dhclient就可以獲取到地址了
ip netns exec vm1 dhclient vm1
ip netns exec vm1 ip addr show vm1
ip netns exec vm1 ip route show

經典網路實現

ovn-nbctl lsp-add out outs-wan
ovn-nbctl lsp-set-addresses outs-wan unknown
ovn-nbctl lsp-set-type outs-wan localnet # 連線物理網路埠型別
ovn-nbctl lsp-set-options outs-wan network_name=wanNet # 做bridge mapping時需要
ovs-vsctl add-br br-eth
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=wanNet:br-eth
ovs-vsctl add-port br-eth eth0

#配置網橋IP
ip link set br-eth up
ip addr add 192.168.66.111/23 dev br-eth

把虛擬機器關聯到邏輯網橋上,這樣物理網絡卡與虛擬機器就在一個網橋上了

ovs-vsctl set Interface vm1 external_ids:iface-id=vm1

FIP實現

其它部分的連結參考上文,這裡主要是路由相關的操作

vm想要出網那麼必須要進行源地址轉換,就和我們訪問google一樣,那我們機器的192.168.x.x的地址就會在路由器上被轉化

#對vpc1
ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=172.66.1.0/24 \
external_ip=192.168.66.45 -- add logical_router gateway_route nat @nat
#會返回uuid
56ad6c5b-8417-4314-95c4-a0d780b5ef0b

這裡66.45是我們連結物理網路的一個地址,告訴路由器使用該地址進行轉換

實現FIP其實就是snat dnat都做:

#對vm3 172.66.1.103  繫結外網 192.168.66.46 
ovn-nbctl -- --id=@nat create nat type="dnat_and_snat" logical_ip=172.66.1.103 \
external_ip=192.168.66.46 -- add logical_router gateway_route nat @nat

ovn ovs與CNI對接

ovn ovs與CNI對接包含兩個部分,CNI外掛僅需要把容器的裝置對一端掛載到ovs網橋上然後配置好地址,與邏輯埠做好對映. 主要是物理面的功能,邏輯管控層面就可以通過CRD進行建立,所以重點是對ovn ovs CNI本身的掌握。

掃碼關注sealyun 探討