golang 效能分析

阿新 • • 發佈：2021-08-20

1. 效能分析

golang 自身提供的工具中包括效能分析工具 - pprof。

runtime/pprof：採集器，負責採集應用程式的執行資料供給 pprof 視覺化工具
net/http/pprof：通過一個 HTTP Server 將 prof 資料進行視覺化分析。

golang 內建提供了多種效能收集器，它們負責收集這些效能資料：

Goroutine: stack traces of all current Goroutines。Go Routine 分析。
CPU: stack traces of CPU returned by runtime。CPU 分析。按照一定的頻率採集所監聽的應用程式 CPU（含暫存器）的使用情況，可確定應用程式在主動消耗 CPU 週期時花費時間的位置

Heap: a sampling of memory allocations of live objects。記憶體分析，在應用程式進行堆分配時記錄堆疊跟蹤，用於監視當前和歷史記憶體使用情況，以及檢查記憶體洩漏
Allocation: a sampling of all past memory allocations。記憶體分析，物件分配統計以及記憶體洩漏等。
Thread: stack traces that led to the creation of new OS threads。OS 執行緒建立情況。
Block: stack traces that led to blocking on synchronization primitives。阻塞分析。記錄 go routine 阻塞情況，等待以及同步情況，timer/channel 通訊等各項細節。

Mutex: stack traces of holders of contended mutexes。互斥鎖分析。包括各種競爭情況。

2. pprof工具分析

（1.）通過在main函式中新增效能監控

package main

import 	(
  _ "net/http/pprof" // 引入 pprof 模組
  _ "github.com/mkevac/debugcharts"  // 可選，圖形化外掛
)

func main(){
    // ...
    // 記憶體分析
	go func() {
		http.ListenAndServe("0.0.0.0:8090", nil) // 監控埠
	}()
    // ...
}

"net/http/pprof" 包將會自動地新增一組 api 端點，該包的 init 函式會做這些事：

func init() {
	http.HandleFunc("/debug/pprof/", Index)
	http.HandleFunc("/debug/pprof/cmdline", Cmdline)
	http.HandleFunc("/debug/pprof/profile", Profile)
	http.HandleFunc("/debug/pprof/symbol", Symbol)
	http.HandleFunc("/debug/pprof/trace", Trace)
}

2.1 web介面檢視

直接通過瀏覽器訪問http://localhost:8090/debug/pprof, 就會看到如下介面：

引數說明：

cpu（CPU Profiling）: $HOST/debug/pprof/profile，預設進行 30s 的 CPU Profiling，得到一個分析用的 profile 檔案
block（Block Profiling）：$HOST/debug/pprof/block，檢視導致阻塞同步的堆疊跟蹤
goroutine：$HOST/debug/pprof/goroutine，檢視當前所有執行的 goroutines 堆疊跟蹤
heap（Memory Profiling）: $HOST/debug/pprof/heap，檢視活動物件的記憶體分配情況
mutex（Mutex Profiling）：$HOST/debug/pprof/mutex，檢視導致互斥鎖的競爭持有者的堆疊跟蹤
threadcreate：$HOST/debug/pprof/threadcreate，檢視建立新OS執行緒的堆疊跟蹤

說明：如果引用了github.com/mkevac/debugcharts 包，可以訪問http://localhost:8090/ui實時的檢視到記憶體和CPU以及GC的執行情況。

2.2 命令列互動

（1.）CPU分析

go tool pprof http://localhost:6060/debug/pprof/profile\?seconds\=60

Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Duration: 1mins, Total samples = 26.55s (44.15%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

通過輸入 top10 即可檢視CPU佔用前10的函式。通過list func_name 可以檢視具體函式佔用的cpu情況。通過輸入pdf 可以匯出為pdf檔案進行分析。通過輸入web 可以生成svg檔案，這裡需要提前安裝graphviz 程式。也可以使用tree 顯示呼叫結構和佔用cpu,記憶體的百分比。使用traces 可以列印所有呼叫棧，以及呼叫棧的指標資訊。help 命令列印所有的可用命令。

(pprof) top10
Showing nodes accounting for 25.92s, 97.63% of 26.55s total
Dropped 85 nodes (cum <= 0.13s)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
    23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
     0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
     0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
     0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
     0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
     0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
     0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
     0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
     0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
     0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write

引數說明：

flat：給定函式上執行耗時
flat%：同上的 CPU 執行耗時總比例
sum%：給定函式累積使用 CPU 總比例
cum：當前函式加上它之上的呼叫執行總耗時
cum%：同上的 CPU 執行耗時總比例

（2.）記憶體分析

go tool pprof http://localhost:8080/debug/pprof/heap

// 停掉後，過一分鐘後再次執行上述命令，使其生成兩個檔案，進行比較
go tool pprof -base pprof.demo2.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz pprof.demo2.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz

// 使用top命令檢視佔用較多的，再使用list檢視具體佔用情況

記憶體分析有幾個引數：

-alloc_objects 申請物件的記憶體佔用情況
-inuse_objects 在使用中的物件記憶體佔用情況
-inuse_space 使用中的記憶體空間

使用pprof的heap能夠獲取程式執行時的記憶體資訊，在程式平穩執行的情況下，每個一段時間使用heap獲取記憶體的profile，然後使用base能夠對比兩個profile檔案的差別，需要不斷的比較兩次的記憶體申請和佔用情況，關注哪些物件的申請不斷的增長，哪些物件在程式結束後，仍然在使用中，沒有釋放記憶體。

要點：

記憶體profiling記錄的是堆記憶體分配的情況，以及呼叫棧資訊，並不是程序完整的記憶體情況
棧記憶體的分配是在呼叫棧結束後會被釋放的記憶體，所以並不在記憶體profile中。
記憶體profiling是基於抽樣的，預設是每1000次堆記憶體分配，執行1次profile記錄。
因為記憶體profiling是基於抽樣和它跟蹤的是已分配的記憶體，而不是使用中的記憶體，（比如有些記憶體已經分配，看似使用，但實際以及不使用的記憶體，比如記憶體洩露的那部分），所以不能使用記憶體profiling衡量程式總體的記憶體使用情況。
在進行堆分配時，記憶體分析會記錄呼叫棧跟蹤，棧分配被認為是無成本的，並且在記憶體 profile 中不被追蹤。
使用記憶體profiling不能夠發現記憶體洩露

2.3 火焰圖

（1.）安裝go-torch

# 下載並複製flamegraph.pl到/usr/local/bin下
git clone https://github.com/brendangregg/FlameGraph.git
cp FlameGraph/flamegraph.pl /usr/local/bin

# 安裝go-torch
go get -v github.com/uber/go-torch

(2.) 使用go-torch檢視

// 使用
go-torch -u http://10.11.209.102:9911 -t 30
go-torch -u http://10.0.2.15:6060 --suffix=/debug/pprof/block -p > torch.svg
go-torch cpu.prof  // top  list

3. 使用goland自帶的基準測試分析

goland2020版本，提供了基準測試，可以進行記憶體、CPU的分析：

如果golang版本較低，可能沒有自帶的執行按鈕，可以使用以下命令執行：

go test -cpuprofile cpu.prof -memprofile mem.prof -bench .

// 對生成的檔案，使用go tool進行分析
go tool pprof cpu.pprof
top10 -cum // 檢視top10佔用情況
list xxx // 檢視具體某個函式的記憶體

go tool pprof -http=":8080" cpu.pprof  // 使用web介面進行分析

4.自行寫指令碼檢視記憶體佔用情況

4.1 使用shell命令來實時檢視記憶體佔用情況

(1.)建立一個指令碼prog_mem.sh，獲取程序佔用的實體記憶體情況,也可以直接使用pidstat -r -p $pid 1, 這裡的pid是程式執行的程序id，表示每隔一秒列印一次記憶體使用情況。

vim prog_mem.sh
#!/bin/bash
prog_name="your_programe_name"
prog_mem=$(pidstat  -r -u -h -C $prog_name |awk 'NR==4{print $12}')
time=$(date "+%Y-%m-%d %H:%M:%S")
echo $time"    memory(Byte):"$prog_mem >>~/record/prog_mem.log

(2.) 使用crontab建立定時任務，每分鐘記錄1次.

crontab -e
*/1 * * * * /record/prog_mem.sh

(3.) 將輸出的日誌資訊prog_mem.log進行分析，可以使用視覺化工具進行檢視。

4.2 使用golang程式開啟記憶體、CPU採集

在要測試的程式碼段中呼叫該函式即可，需要傳輸生成檔案的路徑。由於生成的檔案為二進位制檔案，無法直接檢視，需要使用go tool pprof工具檢視,go tool pprof -http=:6060 cpu.prof.

func enableCpuProfile(cpuProfilePath string) (closer func()) {
	closer = func() {}
	if cpuProfilePath != "" {
		f, err := os.Create(cpuProfilePath)
		if err != nil {
			log.Fatal("could not create cpu profile: %v", err)
		}
		err = pprof.StartCPUProfile(f)
		if err != nil {
			log.Fatal("error: %v", err)
		}
		closer = pprof.StopCPUProfile
	}
	runtime.SetBlockProfileRate(20)
	return
}

func enableMemProfile(memProfilePath string) (closer func()) {
	closer = func() {}
	if memProfilePath != "" {
		closer = func() {
			f, err := os.Create(memProfilePath)
			if err != nil {
				log.Fatal("could not create memory profile: ", err)
			}
			defer f.Close()
			runtime.GC() // get up-to-date statistics
			if err := pprof.WriteHeapProfile(f); err != nil {
				log.Fatal("could not write memory profile: ", err)
			}
		}
	}
	return
}

也可以簡化程式，在main中加入這兩行程式碼，在程式退出時，會生成pprof檔案。

	# 這裡profile引用包：github.com/pkg/profile
	defer profile.Start(profile.CPUProfile).Stop()  // CPU
	defer profile.Start(profile.MemProfile).Stop()  // 記憶體
	defer profile.Start(profile.GoroutineProfile).Stop() // goroutine
	defer profile.Start(profile.BlockProfile).Stop()  // 阻塞
	defer profile.Start(profile.MutexProfile, profile.MemProfile).Stop()  // 合併寫法

5.常見的記憶體洩露

5.1 goroutine洩露

每個goroutine佔用2KB記憶體，洩露1百萬goroutine至少洩露2KB * 1000000 = 2GB記憶體。goroutine執行過程中還存在一些變數，如果這些變數指向堆記憶體中的記憶體，GC會認為這些記憶體仍在使用，不會對其進行回收，這些記憶體誰都無法使用，造成了記憶體洩露。

所以goroutine洩露有2種方式造成記憶體洩露：

goroutine本身的棧所佔用的空間造成記憶體洩露。
goroutine中的變數所佔用的堆記憶體導致堆記憶體洩露，這一部分是能通過heap profile體現出來的。

判斷依據：在節點正常執行的情況下，隔一段時間獲取goroutine的數量，如果後面獲取的那次，某些goroutine比前一次多，如果多獲取幾次，是持續增長的，就極有可能是goroutine洩露。

定位：

http://ip:port/debug/pprof/goroutine?debug=1  //檢視當前阻塞在此goroutine的數量
http://ip:port/debug/pprof/goroutine?debug=2  // 可以檢視goroutine阻塞的時長

總結：

goroutine洩露常見的場景是channel阻塞，無法繼續向下執行，導致此goroutine關聯的記憶體都無法釋放，進一步造成記憶體洩露。

channel的讀或者寫：

無緩衝channel的阻塞通常是寫操作因為沒有讀而阻塞
有緩衝的channel因為緩衝區滿了，寫操作阻塞
期待從channel讀資料，結果沒有goroutine寫
select操作，select裡也是channel操作，如果所有case上的操作阻塞，goroutine也無法繼續執行。

5.2 time.After 記憶體洩露

# 錯誤示例：
package main

import (
    "time"
    )

func main()  {
    ch := make(chan int, 10)

    go func() {
        var i = 1
        for {
            i++
            ch <- i
        }
    }()

    for {
        select {
        case x := <- ch:
            println(x)
        case <- time.After(3 * time.Minute):
            println(time.Now().Unix())
        }
    }
}

在for迴圈每次select的時候，都會例項化一個一個新的定時器。該定時器在3分鐘後，才會被啟用，但是啟用後已經跟select無引用關係，被gc給清理掉。time.After定時器在定時任務到達之前，會一直存在於時間堆中，不會釋放資源，直到定時任務時間到達後才會釋放資源。

在go程式碼中，在for迴圈裡不要使用select + time.After的組合，可以使用time.NewTimer替代：

 1 package main
 2 
 3 import (
 4     "time"
 5     )
 6 
 7 func main()  {
 8     ch := make(chan int, 10)
 9 
10     go func() {
11         for {
12             ch <- 100
13         }
14     }()
15 
16     idleDuration := 3 * time.Minute
17     idleDelay := time.NewTimer(idleDuration)
18     defer idleDelay.Stop()
19 
20     for {
21         idleDelay.Reset(idleDuration)
22 
23         select {
24             case x := <- ch:
25                 println(x)
26             case <-idleDelay.C:
27                 return
28             }
29     }
30 }

6.開發建議：

對頻繁分配的物件，使用 sync.Pool 物件池減少分配時GC壓力

var pool = sync.Pool{New: func() interface{} { return make([]byte, 4096) }}

func fn() {
	buf := pool.Get().([]byte) // takes from pool or calls New
	// do work
	pool.Put(buf) // returns buf to the pool
}

自動化的 DeepCopy 是非常耗時的，其中涉及到反射，記憶體分配，容器(如 map)擴充套件等，大概比手動拷貝慢一個數量級
用 atomic.Load/StoreXXX，atomic.Value, sync.Map 等代替 Mutex。(優先順序遞減)
使用高效的第三方庫，如用fasthttp替代 net/http
在開發環境加上-race編譯選項進行競態檢查
在開發或線上環境開啟 net/http/pprof，方便實時pprof
將所有外部IO(網路IO，磁碟IO)做成非同步
避免[]byte和string轉換,在Go中，使用string()或[]byte()來實現[]byte和string的型別轉換，會額外申請一塊記憶體來複制,可以*(*[]byte)(unsafe.Pointer(&s))使用。
字串的拼接優先考慮bytes.Buffer
slice和map採make建立時，預估大小指定容量
map 不會收縮 “不再使用” 的空間。就算把所有鍵值刪除，它依然保留記憶體空間以待後用

參考連結：

https://juejin.cn/post/6951078043922202631

https://github.com/sxs2473/go-performane-tuning

https://www.cnblogs.com/sunsky303/p/11077030.html

https://www.cnblogs.com/erenming/p/14256868.html

https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html#memory-and-gc

https://www.jetbrains.com/help/go/2020.3/memory-profiler.html#run-memory-profiling

【勵志篇】：古之成大事掌大學問者，不惟有超世之才，亦必有堅韌不拔之志。

golang 效能分析

1. 效能分析

2. pprof工具分析

2.1 web介面檢視

2.2 命令列互動

2.3 火焰圖

3. 使用goland自帶的基準測試分析

4.自行寫指令碼檢視記憶體佔用情況

4.1 使用shell命令來實時檢視記憶體佔用情況

4.2 使用golang程式開啟記憶體、CPU採集

5.常見的記憶體洩露

5.1 goroutine洩露

5.2 time.After 記憶體洩露

6.開發建議：

參考連結：

golang呼叫java的函式_大話golang效能分析（一）：profile基本原理

golang 效能分析

golang json效能分析詳解

golang 使用pprof和go-torch做效能分析

golang json 效能分析

C++ 自增、自減運算子的過載和效能分析小結

MySQL limit效能分析與優化

Python效能分析工具Profile使用例項

Java CPU效能分析工具程式碼例項

Python內建型別效能分析過程例項

GoLang 逃逸分析的機制詳解

基於Tensorflow:CPU效能分析

06丨傾囊相授：我畢生所學的效能分析思路都在這裡了

python 效能分析工具py-spy

Python效能分析工具py-spy原理用法解析

Linux伺服器I/O效能分析-3

效能分析（2）- 應用程式 CPU 使用率過高案例

效能分析（3）- 短時程序導致使用者 CPU 使用率過高案例

MyBatisPlus效能分析外掛,條件構造器,程式碼自動生成器詳解

Linux效能分析-CPU上下文切換

golang 效能分析

1. 效能分析

2. pprof工具分析

2.1 web介面檢視

2.2 命令列互動

2.3 火焰圖

3. 使用goland自帶的基準測試分析

4.自行寫指令碼檢視記憶體佔用情況

4.1 使用shell命令來實時檢視記憶體佔用情況

4.2 使用golang程式開啟記憶體、CPU採集

5.常見的記憶體洩露

5.1 goroutine洩露

5.2 time.After 記憶體洩露

6.開發建議：

參考連結：

相關推薦