Golang效能調優(go-torch, go tool pprof)

阿新 • • 發佈：2019-02-02

Go語言已經為開發者內建配套了很多效能調優監控的好工具和方法，這大大提升了我們profile分析的效率。此外本文還將重點介紹和推薦uber開源的go-torch，其生成的火焰圖更方便更直觀的幫我們進行效能調優。我也是在實際一次的效能調優中，接觸到go-torch，非常棒。

go tool pprof簡介

Golang內建cpu, mem, block profiler

Go強大之處是它已經在語言層面集成了profile取樣工具,並且允許我們在程式的執行時使用它們，使用Go的profiler我們能獲取以下的樣本資訊：

cpu profiles
mem profiles
block profile

Golang常見的profiling使用場景

基準測試檔案：例如使用命令go test . -bench . -cpuprofile prof.cpu生成取樣檔案後，再通過命令 go tool pprof [binary] prof.cpu 來進行分析。
import _ net/http/pprof：如果我們的應用是一個web服務，我們可以在http服務啟動的程式碼檔案(eg: main.go)新增 import _ net/http/pprof，這樣我們的服務便能自動開啟profile功能，有助於我們直接分析取樣結果。
通過在程式碼裡面呼叫runtime.StartCPUProfile或者runtime.WriteHeapProfile等內建方法，即可方便的進行資料取樣。

go tool pprof的使用方法

go tool pprof的引數很多，不做詳細介紹，自己help看看。在這裡，我主要用到的命令為：
go tool pprof --seconds 25 http://localhost:9090/debug/pprof/profile
命令中，設定了25s的取樣時間，當25s取樣結束後，就生成了我們想要的profile檔案，然後在pprof互動命令列中輸入web，從瀏覽器中開啟，就能看到對應的整個呼叫鏈的效能樹形圖。

[email protected]:~/# go tool pprof -h
usage: pprof [options] [binary] <profile source> ...
Output format (only set one):
  -callgrind        Outputs 
 a graph in callgrind format
  -disasm=p         Output annotated assembly for functions matching regexp or address
  -dot              Outputs a graph in DOT format
  -eog              Visualize graph through eog
  -evince           Visualize graph through evince
  -gif              Outputs a graph image in GIF format
  -gv               Visualize graph through gv
  -list=p           Output annotated source for functions matching regexp
  -pdf              Outputs a graph in PDF format
  -peek=p           Output callers/callees of functions matching regexp
  -png              Outputs a graph image in PNG format
  -proto            Outputs the profile in compressed protobuf format
  -ps               Outputs a graph in PS format
  -raw              Outputs a text representation of the raw profile
  -svg              Outputs a graph in SVG format
  -tags             Outputs all tags in the profile
  -text             Outputs top entries in text form
  -top              Outputs top entries in text form
  -tree             Outputs a text rendering of call graph
  -web              Visualize graph through web browser
  -weblist=p        Output annotated source in HTML for functions matching regexp or address
Output file parameters (for file-based output formats):
  -output=f         Generate output on file f (stdout by default)
Output granularity (only set one):
  -functions        Report at function level [default]
  -files            Report at source file level
  -lines            Report at source line level
  -addresses        Report at address level
Comparison options:
  -base <profile>   Show delta from this profile
  -drop_negative    Ignore negative differences
Sorting options:
  -cum              Sort by cumulative data

Dynamic profile options:
  -seconds=N        Length of time for dynamic profiles
Profile trimming options:
  -nodecount=N      Max number of nodes to show
  -nodefraction=f   Hide nodes below <f>*total
  -edgefraction=f   Hide edges below <f>*total
Sample value selection option (by index):
  -sample_index      Index of sample value to display
  -mean              Average sample value over first value
Sample value selection option (for heap profiles):
  -inuse_space      Display in-use memory size
  -inuse_objects    Display in-use object counts
  -alloc_space      Display allocated memory size
  -alloc_objects    Display allocated object counts
Sample value selection option (for contention profiles):
  -total_delay      Display total delay at each region
  -contentions      Display number of delays at each region
  -mean_delay       Display mean delay at each region
Filtering options:
  -runtime          Show runtime call frames in memory profiles
  -focus=r          Restricts to paths going through a node matching regexp
  -ignore=r         Skips paths going through any nodes matching regexp
  -tagfocus=r       Restrict to samples tagged with key:value matching regexp
                    Restrict to samples with numeric tags in range (eg "32kb:1mb")
  -tagignore=r      Discard samples tagged with key:value matching regexp
                    Avoid samples with numeric tags in range (eg "1mb:")
Miscellaneous:
  -call_tree        Generate a context-sensitive call tree
  -unit=u           Convert all samples to unit u for display
  -divide_by=f      Scale all samples by dividing them by f
  -buildid=id       Override build id for main binary in profile
  -tools=path       Search path for object-level tools
  -help             This message
Environment Variables:
   PPROF_TMPDIR       Location for saved profiles (default $HOME/pprof)
   PPROF_TOOLS        Search path for object-level tools
   PPROF_BINARY_PATH  Search path for local binary files
                      default: $HOME/pprof/binaries
                      finds binaries by $name and $buildid/$name

go-torch簡介

go-torch是Uber公司開源的一款針對Golang程式的火焰圖生成工具，能收集 stack traces,並把它們整理成火焰圖，直觀地程式給開發人員。go-torch是基於使用BrendanGregg建立的火焰圖工具生成直觀的影象，很方便地分析Go的各個方法所佔用的CPU的時間。

go-torch的具體使用參加如下help資訊，在這裡，我們主要使用到-u和-t引數:
go-torch -u http://localhost:9090 -t 30

[email protected]:~/# go-torch -h
Usage:
  go-torch [options] [binary] <profile source>

pprof Options:
  -u, --url=         Base URL of your Go program (default: http://localhost:8080)
  -s, --suffix=      URL path of pprof profile (default: /debug/pprof/profile)
  -b, --binaryinput= File path of previously saved binary profile. (binary profile is anything accepted by https://golang.org/cmd/pprof)
      --binaryname=  File path of the binary that the binaryinput is for, used for pprof inputs
  -t, --seconds=     Number of seconds to profile for (default: 30)
      --pprofArgs=   Extra arguments for pprof

Output Options:
  -f, --file=        Output file name (must be .svg) (default: torch.svg)
  -p, --print        Print the generated svg to stdout instead of writing to file
  -r, --raw          Print the raw call graph output to stdout instead of creating a flame graph; use with Brendan Gregg's flame graph perl script (see
                     https://github.com/brendangregg/FlameGraph)
      --title=       Graph title to display in the output file (default: Flame Graph)
      --width=       Generated graph width (default: 1200)
      --hash         Colors are keyed by function name hash
      --colors=      set color palette. choices are: hot (default), mem, io, wakeup, chain, java, js, perl, red, green, blue, aqua, yellow, purple, orange
      --cp           Use consistent palette (palette.map)
      --reverse      Generate stack-reversed flame graph
      --inverted     icicle graph

Help Options:
  -h, --help         Show this help message

環境準備

安裝FlameGraph指令碼

git clone https://github.com/brendangregg/FlameGraph.git

cp flamegraph.pl /usr/local/bin

在終端輸入 flamegraph.pl -h 是否安裝FlameGraph成功:

$ flamegraph.pl -h
Option h is ambiguous (hash, height, help)
USAGE: /usr/local/bin/flamegraph.pl [options] infile > outfile.svg

    --title       # change title text
    --width       # width of image (default 1200)
    --height      # height of each frame (default 16)
    --minwidth    # omit smaller functions (default 0.1 pixels)
    --fonttype    # font type (default "Verdana")
    --fontsize    # font size (default 12)
    --countname   # count type label (default "samples")
    --nametype    # name type label (default "Function:")
    --colors      # set color palette. choices are: hot (default), mem, io,
                  # wakeup, chain, java, js, perl, red, green, blue, aqua,
                  # yellow, purple, orange
    --hash        # colors are keyed by function name hash
    --cp          # use consistent palette (palette.map)
    --reverse     # generate stack-reversed flame graph
    --inverted    # icicle graph
    --negate      # switch differential hues (blue<->red)
    --help        # this message

    eg,
    /usr/local/bin/flamegraph.pl --title="Flame Graph: malloc()" trace.txt > graph.svg

安裝go-torch

有了flamegraph的支援，我們接下來要使用go-torch展示profile的輸出:

go get -v github.com/uber/go-torch

Demo

啟動待調優的程式

在我的例項中，是一個簡單的web Demo，go run main.go -printStats啟動之後，瀏覽器能正常訪問待調優的介面: http://localhost:9090/demo。每次該介面的訪問，都會列印訪問資訊，如下所示：

[email protected]:/# go run main.go -printStats
Starting Server on :9090
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 67.984µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 339.656µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 55.749µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 89.34µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 59.606µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 47.917µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 42.768µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 1.270416ms
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 34.518µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 281.014µs

啟動壓力測試

接下來，我們對該介面進行壓力測試，看看它在大併發情況下的效能表現。

我們使用go-wrk工具進行試壓，go-wrk的安裝請前往github官網https://github.com/adjust/go-wrk，只要把程式碼clone下來go build一下即可。

執行如下命令，進行35s 1W次高併發場景模擬：

go-wrk -d 35 -n 10000 http://localhost:9090/demo

使用go tool pprof

在上面的壓測過程中，我們再新建一個終端視窗輸入以下命令，生成我們的profile檔案：

go tool pprof --seconds 25 http://localhost:9090/debug/pprof/profile

命令中，我們設定了25秒的取樣時間，當看到(pprof)的時候，我們輸入 web, 表示從瀏覽器開啟,可見下圖：

這裡寫圖片描述
看到這個圖，你可能已經懵逼了。在我這個簡單的Demo中，已經這麼難看了，更何況在實際的效能調優中呢！

使用go-torch

在上面的壓測過程中，這次我們使用go-torch來生成取樣報告：

go-torch -u http://localhost:9090 -t 30

30s後，go-torch完成取樣，輸出以下資訊：

Writing svg to torch.svg

torch.svg是go-torch取樣結束後自動生成的profile檔案，我們也用瀏覽器開啟,可見下圖：

這裡寫圖片描述

這就是go-torch生成的火焰圖，看起來是不是舒服多了。

火焰圖的y軸表示cpu呼叫方法的先後，x軸表示在每個取樣呼叫時間內，方法所佔的時間百分比，越寬代表佔據cpu時間越多

有了火焰圖，我們就可以更清楚的看到哪個方法呼叫耗時長了，然後不斷的修正程式碼，重新取樣，不斷優化。

好了，本文只有一個目的，就是希望讓你對golang程式的效能調優更有興趣。接下來，你可以在自己的golang專案中對那些耗時太長的介面進行調優了。

Golang效能調優(go-torch, go tool pprof)

go tool pprof簡介

Golang內建cpu, mem, block profiler

Golang常見的profiling使用場景

go tool pprof的使用方法

go-torch簡介

環境準備

安裝FlameGraph指令碼

安裝go-torch

Demo

啟動待調優的程式

啟動壓力測試

使用go tool pprof

使用go-torch

Golang效能調優(go-torch, go tool pprof)

golang 效能調優分析工具 pprof（下）

Go語言HTTP測試及程式效能調優

Golang 的協程排程機制與 GOMAXPROCS 效能調優

1.效能調優概覽

深入理解Java虛擬機器總結一虛擬機器效能監控工具與效能調優(三)

【Big Data 每日一題】Spark開發效能調優總結

nkv客戶端效能調優

ifeve.com 南方《JVM 效能調優實戰之：使用阿里開源工具 TProfiler 在海量業務程式碼中精確定位效能程式碼》

實時計算 Flink效能調優

Hadoop效能調優全面總結

eclipse效能調優的一次記錄

nginx監控與效能調優

Tomcat效能調優以及遠端管理（Tomcat manager與psi-probe監控）

MySQL 效能調優技巧

Tomcat8 效能調優

JVM效能調優監控工具jps、jstack、jstat、jmap、jinfo使用

Spark之效能調優總結（一）

Nginx效能調優之快取記憶體

第一章 Java效能調優概述

Golang效能調優(go-torch, go tool pprof)

go tool pprof簡介

Golang內建cpu, mem, block profiler

Golang常見的profiling使用場景

go tool pprof的使用方法

go-torch簡介

環境準備

安裝FlameGraph指令碼

安裝go-torch

Demo

啟動待調優的程式

啟動壓力測試

使用go tool pprof

使用go-torch

相關推薦