protobuf序列化/反序列化效能及問題

阿新 • • 發佈：2019-02-18

為了tensorflow專案要求測試protobuf序列化/反序列化的效能，測試過程及測試結果如下：

一. 測試環境

python 2.7 + proto3

二. 測試方法

1. 自定義一個proto訊息（使用protobuf example裡的例子，進行修改）

message Person {
  string name = 1;
  int32 id = 2;  // Unique ID number for this person.
  string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }

  repeated PhoneNumber phones = 4;
}

// Our address book file is just one of these.
message AddressBook {
  repeated Person people = 1;
}

2. 編譯proto檔案

protoc --python_out=. address.proto

得到 addressbook_pb2.py

3. 在測試檔案中，通過修改迴圈的大小，修改序列化內容的大小。並

for i in range(1024 * 1024):
  PromptForAddress(address_book.people.add())

4. 序列化

  begin = datetime.datetime.now()
  serialized = address_book.SerializeToString()
  end = datetime.datetime.now()
  print end-begin
  print len(serialized)
  f.write(serialized)

5. 反序列化

    book = f.read()
    parsebegin = datetime.datetime.now()
    address_book.ParseFromString(book)
    parseend = datetime.datetime.now()
    print parseend-parsebegin
    print len(book)

完整的py檔案如下：

#! /usr/bin/env python

# See README.txt for information and build instructions.

import addressbook_pb2
import sys
import datetime

# This function fills in a Person message based on user input.
def PromptForAddress(person):
  person.id = 160824
  person.name = "xxxxx xxxxx"
  person.email = "[email protected]"
  phone_number = person.phones.add()
  phone_number.number = "12345678"
  phone_number.type = addressbook_pb2.Person.MOBILE

  phone_number = person.phones.add()
  phone_number.number = "23456789"
  phone_number.type = addressbook_pb2.Person.HOME

  phone_number = person.phones.add()
  phone_number.number = "34567890"
  phone_number.type = addressbook_pb2.Person.WORK

# Main procedure:  Reads the entire address book from a file,
#   adds one person based on user input, then writes it back out to the same
#   file.
if len(sys.argv) != 2:
  print "Usage:", sys.argv[0], "ADDRESS_BOOK_FILE"
  sys.exit(-1)
address_book = addressbook_pb2.AddressBook()

# Read the existing address book.
try:
  with open(sys.argv[1], "rb") as f:
    book = f.read()
    parsebegin = datetime.datetime.now()
    address_book.ParseFromString(book)
    parseend = datetime.datetime.now()
    print parseend-parsebegin
    print len(book)



#    address_book.ParseFromString(f.read())
except IOError:
  print sys.argv[1] + ": File not found.  Creating a new file."

# Add an address.
for i in range(1024 * 1024):
  PromptForAddress(address_book.people.add())

# Write the new address book back to disk.
with open(sys.argv[1], "wb") as f:
  begin = datetime.datetime.now()
  serialized = address_book.SerializeToString()
  end = datetime.datetime.now()
  print end-begin
  print len(serialized)
  
'''
address_book = addressbook_pb2.AddressBook()

# Read the existing address book.
try:
  with open(sys.argv[1], "rb") as f:
    book = f.read()
    parsebegin = datetime.datetime.now()
    address_book.ParseFromString(book)
    parseend = datetime.datetime.now()
    print parseend-parsebegin
    print len(book)
'''

6. 修改迴圈次數，記錄不同大小的protobuf序列化反序列的效能

三. 測試結果

位元組（MB）	序列化（s）	反序列化（s）
1.03	0.799453	0.950107
53.00	36.759911	43.303041
61.64	41.674104	52.206466
81.00	63.077295	79.234909
106.00	72.048027	88.280556
102.83	81.08806	102.28786
162.00	128.883403	164.042591
205.66	163.994605	199.729636
243.00	197.582673	246.699898

注：表中位元組大小為序列化後得到的字串大小，即程式中的 len(serialized)

四. 測試分析及問題

根據測試的結果看是基本成線性增長，位元組數越大，所用時間越多。當位元組數為243MB時，序列化耗時3s左右，反序列化耗時4s左右。在測試結果上有幾個問題如下：

1. 測試方法是否正確，我感覺應該是可行的，但是結果比我預期的要大。

2. 本次測試是用Python測試的，我在c++下進行測試，得到的結果比python好很多（C++部分參考FlatBuffers與protobuf效能比較）。

我只對比測試了小資料量（1KB）的，序列化及反序列化均迴圈100次，結果如下：（兩次測試的proto檔案為同一個，在C++中用的序列化/反序列化函式為ParseFromArray/SerializeToArray，python中用的序列化/反序列化函式是ParseFromString/SerializeToString）

序列化（毫秒）	反序列化（毫秒）
Python	63.879	82.89
C++	1.336	1.352

測試結果顯示c++的效能比Python快60-80倍，c++是否能比Python快這麼多？

3. 經查閱相關資料，序列化反序列化跟proto的結構也是有關係的（比如多層巢狀），所以建議在學習tensorflow之後結合tensorflow再進行一次測試，在訓練某一個模型時，將其中序列化反序列化的過程單獨計時。

以上兩個問題還需討論，也歡迎各位批評指正。

protobuf序列化/反序列化效能及問題

一. 測試環境

二. 測試方法

三. 測試結果

四. 測試分析及問題

五. 參考及學習文章

protobuf序列化/反序列化效能及問題

.NET Core protobuf-net、MessagePack、Json.NET序列化/反序列化性能測試

Google Protobuf——實現跨平臺跨語言的序列化/反序列化

python: 序列化/反序列化及物件的深拷貝/淺拷貝

go-gob序列化/反序列化與讀寫檔案效能對比測試

protobuf 序列化反序列介面

Java Serialization/序列化/反序列化及 transient Java關鍵字詳解

MessagePack 新型序列化反序列化方案

10.8-全棧Java筆記:序列化/反序列化的步驟和實例

測試了下boost的序列化反序列化功能

深入JAVA序列化反序列化

文件操作，路徑操作，StringIO和BytesIO，序列化反序列化，正則表達式與python中使用

Python json序列化反序列化，map，reduce，filter

[leetcode]449. Serialize and Deserialize BST序列化反序列化二叉搜尋樹(儘量緊湊)

二進位制流序列化(反序列化)和XML序列化(反序列化)

92 序列化反序列化

92 序列化反序列化

DRF序列化/反序列化

0016-Avro序列化&反序列化和Spark讀取Avro數據

http請求/restful/序列化反序列化/JSON

protobuf序列化/反序列化效能及問題

一. 測試環境

二. 測試方法

三. 測試結果

四. 測試分析及問題

五. 參考及學習文章

相關推薦