ACK One 構建應用系統的兩地三中心容災方案
作者:宇匯,壯懷,先河
概述
兩地三中心是指在兩個城市部署三個業務處理中心,即:生產中心、同城容災中心、異地容災中心。在一個城市部署 2 套環境形成同城雙中心,同時處理業務並通過高速鏈路實現資料同步,可切換執行。在另一城市部署1套環境做異地災備中心,做資料備份,當雙中心同時故障時,異地災備中心可切換處理業務。兩地三中心容災方案可以極大程度的保證業務的連續執行。
使用 ACK One 的多叢集管理應用分發功能,可以幫助企業統一管理 3 個 K8s 叢集,實現應用在 3 個 K8s 叢集快速部署升級,同時實現應用在 3 個 K8s 叢集上的差異化配置。配合使用 GTM(全域性流量管理)可以實現在故障發生時業務流量在 3 個 K8s 叢集的自動切換。對 RDS 資料層面的資料複製,本實踐不做具體介紹,可參考 DTS 資料傳輸服務。
方案架構
前提條件
開啟多叢集管理主控例項[1]
通過管理關聯叢集[2],新增 3 個 K8s 叢集到主控例項中,構建兩地三中心。本實踐中,作為示例,在北京部署 2 個 K8s 叢集(cluster1-beijing 和 cluster2-beijing),在杭州部署 1 個 K8s 叢集(cluster1-hangzhou)。
建立 GTM 例項[3]
應用部署
通過 ACK One 主控例項的應用分發功能[4],在 3 個 K8s 叢集中分發應用。對比傳統的指令碼部署,使用 ACK One 的應用分發可獲得如下收益。
本實踐中,示例應用為 web 應用,包含 K8s Deployment/Service/Ingress/Configmap 資源,Service/Ingress 對外暴露服務,Deployment 讀取 Configmap 中的配置引數。通過建立應用分發規則,將應用分發到 3 個 K8s 叢集,包括 2 個北京叢集,1 個杭州叢集,實現兩地三中心。分發過程中對 deployment 和 configmap 資源做差異化配置,以適應不用地點的叢集,同時分發過程實現人工稽核的灰度控制,限制錯誤的爆炸半徑。
1. 執行一下命令建立名稱空間 demo。
kubectl create namespace demo
2. 使用以下內容,建立 app-meta.yaml 檔案。
apiVersion: apps/v1 kind: Deployment metadata: labels: app: web-demo name: web-demo namespace: demo spec: replicas: 5 selector: matchLabels: app: web-demo template: metadata: labels: app: web-demo spec: containers: - image: acr-multiple-clusters-registry.cn-hangzhou.cr.aliyuncs.com/ack-multiple-clusters/web-demo:0.4.0 name: web-demo env: - name: ENV_NAME value: cluster1-beijing volumeMounts: - name: config-file mountPath: "/config-file" readOnly: true volumes: - name: config-file configMap: items: - key: config.json path: config.json name: web-demo --- apiVersion: v1 kind: Service metadata: name: web-demo namespace: demo labels: app: web-demo spec: selector: app: web-demo ports: - protocol: TCP port: 80 targetPort: 8080 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: web-demo namespace: demo labels: app: web-demo spec: rules: - host: web-demo.example.com http: paths: - path: / pathType: Prefix backend: service: name: web-demo port: number: 80 --- apiVersion: v1 kind: ConfigMap metadata: name: web-demo namespace: demo labels: app: web-demo data: config.json: | { database-host: "beijing-db.pg.aliyun.com" }
3. 執行以下命令,在主控例項上部署應用 web-demo。注意:在主控例項上建立 kube 資源並不會下發到子叢集,此 kube 資源作為原資料,被後續 Application(步驟 4b)中引用。
kubectl apply -f app-meta.yaml
4. 建立應用分發規則。
- a. 執行以下命令,檢視主控例項管理的關聯叢集,確定應用的分發目標
kubectl amc get managedcluster
預期輸出:
Name Alias HubAccepted managedcluster-cxxx cluster1-hangzhou true managedcluster-cxxx cluster2-beijing true managedcluster-cxxx cluster1-beijing true
b. 使用以下內容,建立應用分發規則 app.yaml。替換示例中的和 managedcluster-cxxx 為實際待發布叢集名稱。分發規則定義的最佳實踐在註釋中說明。
在 app.yaml 中,包含以下資源型別:Policy (type:topology) 分發目標,Policy (type: override)差異化規則, Workflow 工作流,Application 應用。具體可參考:應用複製分發[5]、應用分發差異化配置[6]和應用叢集間灰度分發[7]。
apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: cluster1-beijing namespace: demo type: topology properties: clusters: ["<managedcluster-cxxx>"] #分發目標叢集1 cluster1-beijing --- apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: cluster2-beijing namespace: demo type: topology properties: clusters: ["<managedcluster-cxxx>"] #分發目標叢集2 cluster2-beijing --- apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: cluster1-hangzhou namespace: demo type: topology properties: clusters: ["<managedcluster-cxxx>"] #分發目標叢集3 cluster1-hangzhou --- apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: override-env-cluster2-beijing namespace: demo type: override properties: components: - name: "deployment" traits: - type: env properties: containerName: web-demo env: ENV_NAME: cluster2-beijing #對叢集cluster2-beijing的deployment做環境變數的差異化配置 --- apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: override-env-cluster1-hangzhou namespace: demo type: override properties: components: - name: "deployment" traits: - type: env properties: containerName: web-demo env: ENV_NAME: cluster1-hangzhou #對叢集cluster1-hangzhou的deployment做環境變數的差異化配置 --- apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: override-replic-cluster1-hangzhou namespace: demo type: override properties: components: - name: "deployment" traits: - type: scaler properties: replicas: 1 #對叢集cluster1-hangzhou的deployment做副本數的差異化配置 --- apiVersion: core.oam.dev/v1alpha1 kind: Policy metadata: name: override-configmap-cluster1-hangzhou namespace: demo type: override properties: components: - name: "configmap" traits: - type: json-merge-patch #對叢集cluster1-hangzhou的deployment做configmap的差異化配置 properties: data: config.json: | { database-address: "hangzhou-db.pg.aliyun.com" } --- apiVersion: core.oam.dev/v1alpha1 kind: Workflow metadata: name: deploy-demo namespace: demo steps: #順序部署cluster1-beijing,cluster2-beijing,cluster1-hangzhou。 - type: deploy name: deploy-cluster1-beijing properties: policies: ["cluster1-beijing"] - type: deploy name: deploy-cluster2-beijing properties: auto: false #部署cluster2-beijing前需要人工稽核 policies: ["override-env-cluster2-beijing", "cluster2-beijing"] #在部署cluster2-beijing時做環境變數的差異化 - type: deploy name: deploy-cluster1-hangzhou properties: policies: ["override-env-cluster1-hangzhou", "override-replic-cluster1-hangzhou", "override-configmap-cluster1-hangzhou", "cluster1-hangzhou"] #在部署cluster2-beijing時做環境變數,副本數,configmap的差異化 --- apiVersion: core.oam.dev/v1beta1 kind: Application metadata: annotations: app.oam.dev/publishVersion: version8 name: web-demo namespace: demo spec: components: - name: deployment #獨立引用deployment,方便差異化配置 type: ref-objects properties: objects: - apiVersion: apps/v1 kind: Deployment name: web-demo - name: configmap #獨立引用configmap,方便差異化配置 type: ref-objects properties: objects: - apiVersion: v1 kind: ConfigMap name: web-demo - name: same-resource #不做差異化配置 type: ref-objects properties: objects: - apiVersion: v1 kind: Service name: web-demo - apiVersion: networking.k8s.io/v1 kind: Ingress name: web-demo workflow: ref: deploy-demo
5. 執行以下命令,在主控例項上部署分發規則 app.yaml。
kubectl apply -f app.yaml
6. 檢視應用的部署狀態。
kubectl get app web-demo -n demo
預期輸出,workflowSuspending 表示部署暫停
NAME COMPONENT TYPE PHASE HEALTHY STATUS AGE web-demo deployment ref-objects workflowSuspending true 47h
7. 檢視應用在各個叢集上的執行狀態
kubectl amc get deployment web-demo -n demo -m all
預期輸出:
Run on ManagedCluster managedcluster-cxxx (cluster1-hangzhou) No resources found in demo namespace #第一次新部署應用,工作流還沒有開始部署cluster1-hangzhou Run on ManagedCluster managedcluster-cxxx (cluster2-beijing) No resources found in demo namespace #第一次新部署應用,工作流還沒有開始部署cluster2-beijiing,等待人工稽核 Run on ManagedCluster managedcluster-cxxx (cluster1-beijing) NAME READY UP-TO-DATE AVAILABLE AGE web-demo 5/5 5 5 47h #Deployment在cluster1-beijing叢集上執行正常
8. 人工稽核通過,部署叢集 cluster2-beijing,cluster1-hangzhou。
kubectl amc workflow resume web-demo -n demo Successfully resume workflow: web-demo
9. 檢視應用的部署狀態。
kubectl get app web-demo -n demo
預期輸出,running 表示應用執行正常
NAME COMPONENT TYPE PHASE HEALTHY STATUS AGE web-demo deployment ref-objects running true 47h
10. 檢視應用在各個叢集上的執行狀態
kubectl amc get deployment web-demo -n demo -m all
預期輸出:
Run on ManagedCluster managedcluster-cxxx (cluster1-hangzhou) NAME READY UP-TO-DATE AVAILABLE AGE web-demo 1/1 1 1 47h Run on ManagedCluster managedcluster-cxxx (cluster2-beijing) NAME READY UP-TO-DATE AVAILABLE AGE web-demo 5/5 5 5 2d Run on ManagedCluster managedcluster-cxxx (cluster1-beijing) NAME READY UP-TO-DATE AVAILABLE AGE web-demo 5/5 5 5 47h
11. 檢視應用在各個叢集上的 Ingress 狀態
kubectl amc get ingress -n demo -m all
預期結果,每個叢集的 Ingress 執行正常,公網 IP 分配成功。
Run on ManagedCluster managedcluster-cxxx (cluster1-hangzhou) NAME CLASS HOSTS ADDRESS PORTS AGE web-demo nginx web-demo.example.com 47.xxx.xxx.xxx 80 47h Run on ManagedCluster managedcluster-cxxx (cluster2-beijing) NAME CLASS HOSTS ADDRESS PORTS AGE web-demo nginx web-demo.example.com 123.xxx.xxx.xxx 80 2d Run on ManagedCluster managedcluster-cxxx (cluster1-beijing) NAME CLASS HOSTS ADDRESS PORTS AGE web-demo nginx web-demo.example.com 182.xxx.xxx.xxx 80 2d
流量管理
通過配置全域性流量管理,自動檢測應用執行狀態,並在異常發生時,自動切換流量到監控叢集。
1. 配置全域性流量管理例項,web-demo.example.com 為示例應用的域名,請替換為實際應用的域名,並設定 DNS 解析到全域性流量管理的 CNAME 接入域名。
2. 在已建立的 GTM 示例中,建立 2 個地址池:
a、pool-beijing:包含 2 個北京叢集的 Ingress IP 地址,負載均衡策略為返回全部地址,實現北京 2 個叢集的負載均衡。Ingress IP 地址可通過在主控例項上執行 “kubectl amc get ingress -n demo -m all” 獲取。
b、pool-hangzhou:包含 1 個杭州叢集的 Ingress IP 地址。
3. 在地址池中開啟健康檢查,檢查失敗的地址將從地址池中移除,不再接收流量。
4. 配置訪問策略,設定主地址池為北京地址池,備地址池為杭州地址池。正常流量都有北京叢集應用處理,當所有北京叢集應用不可用時,自動切換到杭州叢集應用處理。
部署驗證
1. 正常情況,所有有流量都有北京的 2 個叢集上的應用處理,每個叢集各處理 50% 流量。
for i in {1..50}; do curl web-demo.example.com; sleep 3; done This is env cluster1-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster1-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster1-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" }
2. 當叢集 cluster1-beijing 上的應用異常時,GTM 將所有的流量路由到 cluster2-bejing 叢集處理。
for i in {1..50}; do curl web-demo.example.com; sleep 3; done ... <html> <head><title>503 Service Temporarily Unavailable</title></head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>nginx</center> </body> </html> This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" } This is env cluster2-beijing ! Config file is { database-host: "beijing-db.pg.aliyun.com" }
3. 當叢集 cluster1-beijing 和 cluster2-beijing 上的應用同時異常時,GTM 將流量路由到 cluster1-hangzhou 叢集處理。
for i in {1..50}; do curl web-demo.example.com; sleep 3; done <head><title>503 Service Temporarily Unavailable</title></head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>nginx</center> </body> </html> <html> <head><title>503 Service Temporarily Unavailable</title></head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>nginx</center> </body> </html> This is env cluster1-hangzhou ! Config file is { database-address: "hangzhou-db.pg.aliyun.com" } This is env cluster1-hangzhou ! Config file is { database-address: "hangzhou-db.pg.aliyun.com" } This is env cluster1-hangzhou ! Config file is { database-address: "hangzhou-db.pg.aliyun.com" } This is env cluster1-hangzhou ! Config file is { database-address: "hangzhou-db.pg.aliyun.com" }
總結
本文側重介紹了通過 ACK One 的多叢集應用分發功能,可以幫助企業管理多叢集環境,通過多叢集主控示例提供的統一的應用下發入口,實現應用的多叢集分發,差異化配置,工作流管理等分發策略。結合 GTM 全域性流量管理,快速搭建管理兩地三中心的應用容災系統。
除多叢集應用分發外,ACK One 更是支援連線並管理任何地域、任何基礎設施上的 Kubernetes 叢集,提供一致的管理和社群相容的 API,支援對計算、網路、儲存、安全、監控、日誌、作業、應用、流量等進行統一運維管控。阿里雲分散式雲容器平臺(簡稱 ACK One)是面向混合雲、多叢集、分散式計算、容災等場景推出的企業級雲原生平臺。更多內容可以檢視產品介紹分散式雲容器平臺 ACK One[8]。
相關連結
[1] 開啟多叢集管理主控例項:
[2] 通過管理關聯叢集:
[3] 建立 GTM 例項:
[4] 應用分發功能:
[5] 應用複製分發:
[6] 應用分發差異化配置:
[7] 應用叢集間灰度分發:
[8] 分散式雲容器平臺 ACK One:
本文為阿里雲原創內容,未經允許不得轉載。