K8s中Pod健康檢查原始碼分析
摘要: 本文基於k8s 1.11.0版本的從原始碼的角度分析了Pod的健康檢查實現邏輯。建議通過k8s部署生產環境應用時,請務必設定上liveness和readiness, 這是保障服務穩定性的最佳實踐。
瞭解k8s中的Liveness和Readiness
Liveness:
表明是否容器正在執行。如果liveness探測為fail,則kubelet會kill掉容器,並且會觸發restart設定的策略。預設不設定的情況下,該狀態為success.
Readiness:
表明容器是否可以接受服務請求。如果readiness探測失敗,則endpoints控制器會從endpoints中摘除該Pod IP。在初始化延遲探測時間之前,預設是Failure。如果沒有設定readiness探測,該狀態為success。
程式碼分析
基於Kubernetes 1.11.0
1.啟動探測
在kubelet啟動是時候會啟動健康檢查的探測:
kubelet.go中Run方法
...
kl.probeManager.Start() //啟動探測服務
...
2.看一下probeManager都做了哪些事情
prober_manager.go中我們看一下這段程式碼:
// Manager manages pod probing. It creates a probe "worker" for every container that specifies a // probe (AddPod). The worker periodically probes its assigned container and caches the results. The // manager use the cached probe results to set the appropriate Ready state in the PodStatus when // requested (UpdatePodStatus). Updating probe parameters is not currently supported. // TODO: Move liveness probing out of the runtime, to here. type Manager interface { // AddPod creates new probe workers for every container probe. This should be called for every // pod created. AddPod(pod *v1.Pod) // RemovePod handles cleaning up the removed pod state, including terminating probe workers and // deleting cached results. RemovePod(pod *v1.Pod) // CleanupPods handles cleaning up pods which should no longer be running. // It takes a list of "active pods" which should not be cleaned up. CleanupPods(activePods []*v1.Pod) // UpdatePodStatus modifies the given PodStatus with the appropriate Ready state for each // container based on container running status, cached probe results and worker states. UpdatePodStatus(types.UID, *v1.PodStatus) // Start starts the Manager sync loops. Start() }
這是一個Manager的介面宣告,該Manager負載pod的探測。當執行AddPod時,會為Pod中每一個容器建立一個執行探測任務的worker, 該worker會對所分配的容器進行週期性的探測,並把探測結果快取。當UpdatePodStatus方法執行時,該manager會使用探測的快取結果設定PodStatus為近似Ready的狀態:
3.一“探”究竟
先看一下探測的struct
type Probe struct { // The action taken to determine the health of a container Handler `json:",inline" protobuf:"bytes,1,opt,name=handler"` // Number of seconds after the container has started before liveness probes are initiated. // More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes // +optional InitialDelaySeconds int32 `json:"initialDelaySeconds,omitempty" protobuf:"varint,2,opt,name=initialDelaySeconds"` // Number of seconds after which the probe times out. // Defaults to 1 second. Minimum value is 1. // More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes // +optional TimeoutSeconds int32 `json:"timeoutSeconds,omitempty" protobuf:"varint,3,opt,name=timeoutSeconds"` // How often (in seconds) to perform the probe. // Default to 10 seconds. Minimum value is 1. // +optional PeriodSeconds int32 `json:"periodSeconds,omitempty" protobuf:"varint,4,opt,name=periodSeconds"` // Minimum consecutive successes for the probe to be considered successful after having failed. // Defaults to 1. Must be 1 for liveness. Minimum value is 1. // +optional SuccessThreshold int32 `json:"successThreshold,omitempty" protobuf:"varint,5,opt,name=successThreshold"` // Minimum consecutive failures for the probe to be considered failed after having succeeded. // Defaults to 3. Minimum value is 1. // +optional FailureThreshold int32 `json:"failureThreshold,omitempty" protobuf:"varint,6,opt,name=failureThreshold"` }
initialDelaySeconds: 表示容器啟動之後延遲多久進行liveness探測
timeoutSeconds:每次執行探測的超時時間
periodSeconds:探測的週期時間
successThreshold:最少連續幾次探測成功的次數,滿足該次數則認為success。
failureThreshold:最少連續幾次探測失敗的次數,滿足該次數則認為fail
Handler:
不論是liveness還是readiness都支援3種類型的探測方式:執行命令、http方式以及tcp方式。
// Handler defines a specific action that should be taken
// TODO: pass structured data to these actions, and document that data here.
type Handler struct {
// One and only one of the following should be specified.
// Exec specifies the action to take.
// +optional
Exec *ExecAction `json:"exec,omitempty" protobuf:"bytes,1,opt,name=exec"`
// HTTPGet specifies the http request to perform.
// +optional
HTTPGet *HTTPGetAction `json:"httpGet,omitempty" protobuf:"bytes,2,opt,name=httpGet"`
// TCPSocket specifies an action involving a TCP port.
// TCP hooks not yet supported
// TODO: implement a realistic TCP lifecycle hook
// +optional
TCPSocket *TCPSocketAction `json:"tcpSocket,omitempty" protobuf:"bytes,3,opt,name=tcpSocket"`
}
接下來看一下prober.go中的runProbe方法。
func (pb *prober) runProbe(probeType probeType, p *v1.Probe, pod *v1.Pod, status v1.PodStatus, container v1.Container, containerID kubecontainer.ContainerID) (probe.Result, string, error) {
timeout := time.Duration(p.TimeoutSeconds) * time.Second
if p.Exec != nil {
glog.V(4).Infof("Exec-Probe Pod: %v, Container: %v, Command: %v", pod, container, p.Exec.Command)
command := kubecontainer.ExpandContainerCommandOnlyStatic(p.Exec.Command, container.Env)
return pb.exec.Probe(pb.newExecInContainer(container, containerID, command, timeout))
}
if p.HTTPGet != nil {
scheme := strings.ToLower(string(p.HTTPGet.Scheme))
host := p.HTTPGet.Host
if host == "" {
host = status.PodIP
}
port, err := extractPort(p.HTTPGet.Port, container)
if err != nil {
return probe.Unknown, "", err
}
path := p.HTTPGet.Path
glog.V(4).Infof("HTTP-Probe Host: %v://%v, Port: %v, Path: %v", scheme, host, port, path)
url := formatURL(scheme, host, port, path)
headers := buildHeader(p.HTTPGet.HTTPHeaders)
glog.V(4).Infof("HTTP-Probe Headers: %v", headers)
if probeType == liveness {
return pb.livenessHttp.Probe(url, headers, timeout)
} else { // readiness
return pb.readinessHttp.Probe(url, headers, timeout)
}
}
if p.TCPSocket != nil {
port, err := extractPort(p.TCPSocket.Port, container)
if err != nil {
return probe.Unknown, "", err
}
host := p.TCPSocket.Host
if host == "" {
host = status.PodIP
}
glog.V(4).Infof("TCP-Probe Host: %v, Port: %v, Timeout: %v", host, port, timeout)
return pb.tcp.Probe(host, port, timeout)
}
glog.Warningf("Failed to find probe builder for container: %v", container)
return probe.Unknown, "", fmt.Errorf("Missing probe handler for %s:%s", format.Pod(pod), container.Name)
}
1.執行命令方式
通過newExecInContainer方法呼叫CRI執行命令:
// ExecAction describes a "run in container" action.
type ExecAction struct {
// Command is the command line to execute inside the container, the working directory for the
// command is root ('/') in the container's filesystem. The command is simply exec'd, it is
// not run inside a shell, so traditional shell instructions ('|', etc) won't work. To use
// a shell, you need to explicitly call out to that shell.
// Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
// +optional
Command []string `json:"command,omitempty" protobuf:"bytes,1,rep,name=command"`
}
2.http GET方式
通過http GET方式進行探測。
Port:表示訪問容器的埠
Host:表示訪問的主機,預設是Pod IP
// HTTPGetAction describes an action based on HTTP Get requests.
type HTTPGetAction struct {
// Path to access on the HTTP server.
// +optional
Path string `json:"path,omitempty" protobuf:"bytes,1,opt,name=path"`
// Name or number of the port to access on the container.
// Number must be in the range 1 to 65535.
// Name must be an IANA_SVC_NAME.
Port intstr.IntOrString `json:"port" protobuf:"bytes,2,opt,name=port"`
// Host name to connect to, defaults to the pod IP. You probably want to set
// "Host" in httpHeaders instead.
// +optional
Host string `json:"host,omitempty" protobuf:"bytes,3,opt,name=host"`
// Scheme to use for connecting to the host.
// Defaults to HTTP.
// +optional
Scheme URIScheme `json:"scheme,omitempty" protobuf:"bytes,4,opt,name=scheme,casttype=URIScheme"`
// Custom headers to set in the request. HTTP allows repeated headers.
// +optional
HTTPHeaders []HTTPHeader `json:"httpHeaders,omitempty" protobuf:"bytes,5,rep,name=httpHeaders"`
}
3.tcp方式
通過設定主機和埠即可進行tcp方式訪問
// TCPSocketAction describes an action based on opening a socket
type TCPSocketAction struct {
// Number or name of the port to access on the container.
// Number must be in the range 1 to 65535.
// Name must be an IANA_SVC_NAME.
Port intstr.IntOrString `json:"port" protobuf:"bytes,1,opt,name=port"`
// Optional: Host name to connect to, defaults to the pod IP.
// +optional
Host string `json:"host,omitempty" protobuf:"bytes,2,opt,name=host"`
}
此處腦洞一下:如果三種探測方式都設定了,會如何執行處理?
思考
通過k8s部署生產環境應用時,建議設定上liveness和readiness, 這也是保障服務穩定性的最佳實踐。
另外由於Pod Ready不能保證實際的業務應用Ready可用,在最新的 1.14 版本中新增了一個Pod Readiness Gates
特性 。通過這個特性,可以保證應用Ready後進而設定Pod Ready。
結尾
針對上面的腦洞:如果三種探測方式都設定了,會如何執行處理?
答:我們如果在Pod中設定多個探測方式,提交配置的時候會直接報錯:
此處繼續原始碼:在validation.go中validateHandler中進行了限制(也為上面Handler struct提到的"One and only one of the following should be specified."提供了事實依據)
func validateHandler(handler *core.Handler, fldPath *field.Path) field.ErrorList {
numHandlers := 0
allErrors := field.ErrorList{}
if handler.Exec != nil {
if numHandlers > 0 {
allErrors = append(allErrors, field.Forbidden(fldPath.Child("exec"), "may not specify more than 1 handler type"))
} else {
numHandlers++
allErrors = append(allErrors, validateExecAction(handler.Exec, fldPath.Child("exec"))...)
}
}
if handler.HTTPGet != nil {
if numHandlers > 0 {
allErrors = append(allErrors, field.Forbidden(fldPath.Child("httpGet"), "may not specify more than 1 handler type"))
} else {
numHandlers++
allErrors = append(allErrors, validateHTTPGetAction(handler.HTTPGet, fldPath.Child("httpGet"))...)
}
}
if handler.TCPSocket != nil {
if numHandlers > 0 {
allErrors = append(allErrors, field.Forbidden(fldPath.Child("tcpSocket"), "may not specify more than 1 handler type"))
} else {
numHandlers++
allErrors = append(allErrors, validateTCPSocketAction(handler.TCPSocket, fldPath.Child("tcpSocket"))...)
}
}
if numHandlers == 0 {
allErrors = append(allErrors, field.Required(fldPath, "must specify a handler type"))
}
return allErrors
}
作者:元毅
原文連結
本文為雲棲社群原創內容,未經