Go語言三種方式讀取檔案效率對比及原因分析
阿新 • • 發佈:2018-12-13
最近有遇到需要用go讀取大檔案的情況,順路研究了一下go幾種讀取檔案方式的效率。
go幾種常見的檔案io方式
-
使用os包內的open和read。
fi, err := os.Open(path) // 開啟檔案 buf := make([]byte, 1024) n, err := fi.Read(buf) // 讀取內容
-
使用buffered io
fi, err := os.Open(path) r := bufio.NewReader(fi) buf := make([]byte, 1024) n, err := r.Read(buf)
-
使用ioutil包內的方法
fi, err :=
現象(效率對比)
準備了待讀取檔案資訊如下:
total 720912
-rw-r--r-- 1 stephen staff 2.3K Sep 15 11:59 io_demo.go
-rw-r--r-- 1 stephen staff 336M Sep 15 11:59 test.txt
同時io_demo.go檔案中的程式碼如下:
package main
import (
"bufio"
"fmt"
"io"
"io/ioutil"
"os"
"time"
)
func readRaw(path string) string {
start := time.Now()
fi, err := os.Open(path)
if err != nil {
panic(err)
}
defer fi.Close()
defer func() {
fi.Close()
fmt.Printf("[readRaw] cost time %v \n", time.Now().Sub(start))
}()
var data []byte
buf := make([]byte, 1024)
for {
n, err := fi.Read(buf)
if err != nil && err != io.EOF {
panic(err)
}
data = append(data, buf[:n]...)
if 0 == n {
break
}
}
return string(data)
}
func readWithBufferIO(path string) string {
start := time.Now()
fi, err := os.Open(path)
if err != nil {
panic(err)
}
defer func() {
fi.Close()
fmt.Printf("[readWithBufferIO] cost time %v \n", time.Now().Sub(start))
}()
r := bufio.NewReader(fi)
var data []byte
buf := make([]byte, 1024)
for {
n, err := r.Read(buf)
if err != nil && err != io.EOF {
panic(err)
}
if 0 == n {
break
}
data = append(data, buf[:n]...)
}
return string(data)
}
func readWithIOUtil(path string) string {
start := time.Now()
fi, err := os.Open(path)
if err != nil {
panic(err)
}
defer func() {
fi.Close()
fmt.Printf("[readWithIOUtil] cost time %v \n", time.Now().Sub(start))
}()
fd, err := ioutil.ReadAll(fi)
return string(fd)
}
func main() {
file := "test.txt"
readRaw(file)
readWithBufferIO(file)
readWithIOUtil(file)
}
用如上程式碼讀取已準備的檔案,多次測試用時資訊如下(進行了超過10次測試,僅取了兩個結果來說明問題):
[readRaw] cost time 1.490717874s
[readWithBufferIO] cost time 573.336617ms
[readWithIOUtil] cost time 379.678285ms
[readRaw] cost time 1.45133396s
[readWithBufferIO] cost time 541.944555ms
[readWithIOUtil] cost time 983.909509ms
可以看到,毫無疑問使用os包readRaw讀取的方式是最慢的,且相比其他兩種方式要慢很多。但是readWithBufferIO和readWithIOUtil 兩種方式速度的快慢就很難分伯仲了。
透過現象看本質
既然得到了這個結論,那麼我們來看看為什麼會這樣。
1. 為什麼bufferIO會比普通read快?
看bufio原始碼
// NewReader returns a new Reader whose buffer has the default size.
func NewReader(rd io.Reader) *Reader {
return NewReaderSize(rd, defaultBufSize)
}
再看NewReaderSize方法
// NewReaderSize returns a new Reader whose buffer has at least the specified
// size. If the argument io.Reader is already a Reader with large enough
// size, it returns the underlying Reader.
func NewReaderSize(rd io.Reader, size int) *Reader {
// Is it already a Reader?
b, ok := rd.(*Reader)
if ok && len(b.buf) >= size {
return b
}
if size < minReadBufferSize {
size = minReadBufferSize
}
r := new(Reader)
r.reset(make([]byte, size), rd)
return r
}
bufferio預設建立一個大小為4096 byte的緩衝區,它的 read 方法執行一次IO系統呼叫讀取4096byte(4K)大小到緩衝區,此後r.Read(buf)
都會從緩衝區中讀。而普通io每次讀/寫操作都會執行系統呼叫,必然會比bufferIO慢很多,畢竟每次系統呼叫都會從執行從使用者態到核心態的切換。
2. 為什麼bufferio和ioutil的效率難分伯仲?
來看ioutil
原始碼
// MinRead is the minimum slice size passed to a Read call by
// Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond
// what is required to hold the contents of r, ReadFrom will not grow the
// underlying buffer.
const MinRead = 512
// ReadAll reads from r until an error or EOF and returns the data it read.
// A successful call returns err == nil, not err == EOF. Because ReadAll is
// defined to read from src until EOF, it does not treat an EOF from Read
// as an error to be reported.
func ReadAll(r io.Reader) ([]byte, error) {
return readAll(r, bytes.MinRead)
}
// readAll reads from r until an error or EOF and returns the data it read
// from the internal buffer allocated with a specified capacity.
func readAll(r io.Reader, capacity int64) (b []byte, err error) {
var buf bytes.Buffer
// If the buffer overflows, we will get bytes.ErrTooLarge.
// Return that as an error. Any other panic remains.
defer func() {
e := recover()
if e == nil {
return
}
if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
err = panicErr
} else {
panic(e)
}
}()
if int64(int(capacity)) == capacity {
buf.Grow(int(capacity))
}
_, err = buf.ReadFrom(r)
return buf.Bytes(), err
}
可以看到,ioutil.ReadAll
最後實現的也是一個帶緩衝的IO,且大小在512byte以上,且使用的是bytes.Buffer,可以根據情況動態的增長。但是的Grow時重新分配buf也會帶來一些開銷,所以兩者相比就變成了一個權衡,沒有絕對佔優。
但是ioutil的好處就是方便,ioutil.ReadAll
或者ioutil.ReadFile
一行程式碼就搞定。