1. 程式人生 > >zhreshol版本mxnet yolo2 的坑(執行+server部署)

zhreshol版本mxnet yolo2 的坑(執行+server部署)

本帖主要是個人備忘,語無倫次請見諒!!

問題一:其demo執行時就報錯,把data=list(),label=list()就好了。。。。坑啊~~~~~~~~~~


問題二:

我的mxnet和mxnet-model-server都是官網原始碼本地編譯最新版本。。。。。從GitHub上下載了一個用mxnet實現yolo2的程式碼,程式碼自帶mxnet,版本是0.10.1,自帶版本有mx.sym.stack_neighbor和mx.sym.contrib.YoloOutput(),而官方原始碼沒有這些。。。。。。現在的問題是:舊mxnet版本yolo2原始碼裡產生的模型symbol裡有stack_neighbor()這個語句,而官網mxnet已經改成stack()了,導致使用最新的mxnet-model-server執行部署模型時,無法解析stack_neighbor。錯誤如下:

mms/mxnet_model_server.py:_arg_process:184 Failed to process arguments: Failed loading Op stack_downsample of type stack_neighbor: [06:41:11] src/core/op.cc:55: Check failed: op != nullptr Operator stack_neighbor is not registered

我不想降級已經編譯的最新的mxnet和model-server,但又要部署舊版本產生的model,怎麼辦???


有人說通過修改json檔案替換為stack(stack不等於stack_neighbor,此方法不對!!!詳見下面),再mxnet-model-export產生的檔案可以避免新版本無法解析stack_neighbor的問題。。。但又出現了contrib.YoloOutput無法解析啊。。。。治標不治本!!!!

於是,
我把 @zhreshold yolo2原始碼中/src/operator/contrib/中有關yolo_output的三個檔案(.h .cu .cc)拷貝到mxnet最新原始碼 /incubator-mxnet/src/operator/contrib/裡,重新編譯mxnet,奇蹟般的新版本mxnet也有contirb.YoloOutput了。模型順利裝載,但又出現stack不相容stack_neighbor的問題,報錯如下:
model_server.py:_arg_process:184 Failed to process arguments: Cannot find argument ‘kernel’, Possible Arguments:
axis : int, optional, default='0’
The axis in the result array along which the input arrays are stacked.
num_args : int, required
Number of inputs to be stacked.
, in operator stack(name=“stack_downsample”, kernel="(2, 2)")

原因是這兩個函式的輸入引數不一樣。。。stack的引數是:
Parameters:


data (Symbol[]) 鈥?List of arrays to stack
axis (int, optional, default=‘0’) 鈥?The axis in the result array along which the input arrays are stacked.
name (string, optional.) 鈥?Name of the resulting symbol.
stack_neighbor的引數是:
Parameters
data : Symbol
Input data array
kernel : Shape(tuple), optional, default=(1,1)
Stack spatial neighbors defined by kernel along channel axis. The output has same elements as input, but the shape/dimension/order has been changed according to the kernel shape.
name : string, optional.

Name of the resulting symbol.

於是,我強行把舊版本的stack_neighbor的程式碼一段一段地加到新版本mxnet的matrix_op(.cu,.h,.cc)裡,並重新編譯安裝了mxnet。我手動測試了一下stack_neighbor的正確性,發現輸入1,1,8,8,kernel=(2,2),輸出是1,4,4,4。所以,新加入的stack_neighbor應該沒問題!!!

隨後部署時出現這樣的錯:

Initialized model serving.

[ERROR 2018-03-08 03:33:56,606 PID:10704 /home/jojo/anaconda3/lib/python3.6/site-packages/mms/mxnet_model_server.py:_arg_process:184 Failed to process arguments: Parameter file in model archive is inconsistent with manifest.

問了論壇大神,大神說模型路徑不對,如下:

 raise Exception('Failed to open manifest file. Stacktrace: ' + str(e))
validate(manifest, schema)
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Signature']))) == 1, \
'Signature file in model archive is inconsistent with manifest.'
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Symbol']))) == 1, \
'Symbol file in model archive is inconsistent with manifest.'
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Parameters']))) == 1, \
'Parameter file in model archive is inconsistent with manifest.'
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Service']))) == 1, \
'Service file in model archive is inconsistent with manifest.'
model_name = manifest['Model']['Model-Name']

return service_name, model_name, model_dir, manifest

可以看出是parameter檔案沒找到,故把param檔案又拷貝到export模型的資料夾裡。再次執行說名字不對,好吧,你說不對就不對,我複製兩個一樣的param檔案,名字按它要求的來。最後,搞定!!!

補充一下:根據官網sdd部署的例子直接照搬部署yolo2是不行的,因為沒有去均值。。。。詳情見我其他帖子。。。。