python 程式碼使用ctypes呼叫C介面實現效能優化的解決方案

阿新 • • 發佈：2021-01-07

由於python相較於C++執行較慢，例如在DL時代，大規模的資料處理有的時候即便是多程序python也顯得捉襟見肘，所以效能優化非常重要，下面是基於ctypes的效能優化流程：

一、效能分析

第一步首先要分析程式碼中哪些模組耗時，各模組的耗時比要有所掌握，這裡使用line-profiler工具進行分析；

安裝:pip install line-profiler

使用：

（1）不需要import；

（2）在需要分析效能的函式前面加上修飾器 "@profile"，如下圖所示：

（3）使用命令：kernprof -lxxxxx.py

-l表示逐行分析時間，最後結果會自動寫到同目錄級下的.lprof檔案中，若是要是直接可視化出來，則加上 -v引數，效果如下：

（上表中會有執行時間、執行次數、時間佔比等）

（4）使用命令：python -m line_profiler xxxxx.py.lprof 檢視lprof檔案；

二、基於ctypes的效能優化

1、作用：ctypes用來為python提供C的介面，可以將C++實現的模組編譯為.dll或.so，然後在python中呼叫對應dll中的模組名從而進行加速；

2、例程（目的是將cv2.imread()讀取過程放在C++實現）：

（1）C++中的程式碼：

#include<opencv2/opencv.hpp>
#include<stdlib.h>

#define DLLEXPORT extern "C" __declspec(dllexport)  #DLLEXPORT用於宣告將要生成到DLL中的函式

typedef  
struct Ret {
    int* buffer_args;
    uchar* buffer_img;
}Ret, *Ret_p;

DLLEXPORT Ret_p imread(char* img_path) {
    Ret_p ret_p = (Ret_p)malloc(sizeof(Ret));
    cv::Mat img = cv::imread(img_path);
    int img_h = img.rows;
    int img_w = img.cols;
    int img_c = img.channels();

    uchar* buffer_img = (uchar*)malloc 
(sizeof(uchar) * (img_h * img_w * img_c));
    ret_p->buffer_img = buffer_img;

    int* buffer_args = (int*)malloc(sizeof(int) * 3);
    memcpy(buffer_img, img.data, img_h * img_w * img_c);
    int args[3] = { img_h, img_w, img_c };
    memcpy(buffer_args, args, 3*sizeof(int));
    ret_p->buffer_args = buffer_args;
    return ret_p;
}

DLLEXPORT void release(uchar* data) {
    free(data);
}

由上面程式碼可知：C++中實現模組功能獲得輸出後，將輸出儲存到記憶體中，然後python呼叫該記憶體即可。

設定為生成.dll即可。

（2）python中程式碼：

import os
import cv2
import ctypes
import numpy as np

c_uint8_p = ctypes.POINTER(ctypes.c_ubyte)
c_int_p = ctypes.POINTER(ctypes.c_int) #ctypes沒有定義c_int_p，因此需要自己構造
class Ret_p(ctypes.Structure):
    _fields_ = [("args", c_int_p),
                ("img_p", c_uint8_p*1)]

def main():
    template_h, template_w, template_c = 750, 840, 3
    src_img_path = "./template.jpg"
    dll = ctypes.WinDLL(r"./match.dll") #指定dll檔案，後面將會呼叫這個dll中的函式
    src_img_path = ctypes.c_char_p(bytes(src_img_path, "utf-8")) #輸入端：將python變數轉換為ctypes型別變數從而適配C++端的輸入
    dll.imread.restype = ctypes.POINTER(Ret_p) #輸出端：設定ctypes型別變數
    pointer = dll.imread(src_img_path)  #呼叫dll中的imread函式，返回的是ctypes中的指標，指向自定義的結構體
    
    ret_args = np.asarray(np.fromiter(pointer.contents.args, dtype=np.int, count=3))  #np.fromiter從迭代器中獲取資料，這裡即可以從指標指向的記憶體區域中迭代去值
    
    print(ret_args)
    img = np.asarray(np.fromiter(pointer.contents.img_p, dtype=np.uint8, count=template_h*template_w*template_c))
    img = img.reshape((template_h, template_w, template_c))
    cv2.imwrite("./1.jpg", img)

if __name__ == "__main__":
    main()

要注意：在使用ctypes時會涉及到 C++、ctypes和 python三方的資料轉換，對應關係參考：https://blog.csdn.net/wchoclate/article/details/11684905

3、其他：

（1）做了一個實驗，在實際測試中發現，原生cv2.imread()要由於這種ctypes的呼叫方式，因為C++實現了讀取操作後（將圖片資料儲存為一維陣列到記憶體中），但是還要在python中使用np.fromiter從記憶體中讀取並reshape，這步非常耗時（涉及到多次迭代的記憶體讀取開銷）。

python 程式碼使用ctypes呼叫C介面實現效能優化的解決方案

python 程式碼使用ctypes呼叫C介面實現效能優化的解決方案

python利用ctypes呼叫C++動態庫

python呼叫API介面實現登陸簡訊驗證

C++呼叫C介面的實現示例

10行Python程式碼計算汽車數量的實現方法

python使用ctypes呼叫擴充套件模組的例項方法

python利用百度雲介面實現車牌識別的示例

android呼叫C語言實現記憶體的讀取與修改的方法示例

如何使用python的ctypes呼叫醫保中心的dll動態庫下載醫保中心的賬單

win python 通過阿里雲API介面實現 DDNS 動態解析

python程式碼利用梯度下降法實現簡單的線性迴歸

C++呼叫C介面

java呼叫C++介面

《資料結構》樹和二叉樹程式碼整理（C語言實現）

呼叫第三方介面實現呼叫臺功能 -WebRTC

Django之使用celery和NGINX生成靜態頁面實現效能優化

python 遞迴呼叫返回None的問題及解決方法

Java併發程式設計如何降低鎖粒度並實現效能優化

python：pip install 報錯： Microsoft Visual C++ 9.0 is required 解決方案

mysql如何做效能優化？實現效能優化的乾貨方法（供初學者參考）

python 程式碼使用ctypes呼叫C介面實現效能優化的解決方案

相關推薦