1. 程式人生 > >複數型別矩陣相乘GPU加速--pycuda

複數型別矩陣相乘GPU加速--pycuda

複數型別矩陣相乘GPU加速–pycuda

我在用卷積定理做卷積時遇到的一個問題就兩矩陣做完FFT之後都是為複數怎麼用pycuda做矩陣相乘,在給GPU傳遞引數的時候總是有問題。
通過幾天的摸索給出下面程式碼可完成矩陣的相乘,資料型別為complex。其中值得注意的地方就是對於呼叫了from jinja2 import Template
這個編譯器,不然用 sourceModule總是會有錯誤,直接在用#include “complex.h”中的complex 去定義 complex * a 是不可以。
由於C++ 本人沒學過,不好說,pycuda-complex.hpp這個應該是C++的標頭檔案。 Jinja2 模板中執行任意程式碼 ,可以看看這個程式碼

利用 Python 特性在 Jinja2 模板中執行任意程式碼有興趣可以看看。

#-*- coding: utf-8 -*-
import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule
from jinja2 import Template
import numpy as np
KERNEL = Template("""
    #include <stdio.h>
    #include <pycuda-complex.hpp>

    typedef pycuda::complex<float
>
scmplx; typedef pycuda::complex<double> dcmplx; __global__ void complex_mat_mul(const
{{complex_type}} *a, const {{complex_type}} *b, {{complex_type}} *res) { int row = threadIdx.y; int col = threadIdx.x; int mat_id = blockIdx.x * gridDim.x + blockIdx.y;
{{complex_type}} entry = 0; for (int e = 0; e < {{mat_dim}}; ++e) { entry += a[mat_id*{{mat_dim}}*{{mat_dim}} + row * {{mat_dim}} + e] * b[mat_id*{{mat_dim}}*{{mat_dim}} + e * {{mat_dim}} + col]; } res[mat_id*{{mat_dim}}*{{mat_dim}} + row * {{mat_dim}} + col] = entry; } """) data_types = { 'scmplx': np.complex64, 'dcmplx': np.complex128, 'float': np.float32, 'double': np.float64 } def render_kernel(complex_type, real_type, mat_dim, block, gird): templ = KERNEL.render( complex_type=complex_type, real_type=real_type, mat_dim=mat_dim, blockDim_x=block[0], blockDim_y=block[1] ) # print(templ) return templ complex_type = 'dcmplx' real_type = 'double' mat_dim = 4 block = (mat_dim,mat_dim,1) grid = (1,1) program = SourceModule(render_kernel(complex_type, real_type, mat_dim, block, grid)) complex_mat_mul = program.get_function("complex_mat_mul") mats_1 = np.array(( [[1,1,1,0], [0,1,1,1], [0,0,1,1], [0,0,1,1] ]), dtype=np.complex128) mats_2 = np.array(( [[1,1,1,0], [0,1,1,1], [0,0,1,1], [0,0,1,1] ]), dtype=np.complex128) result = mats_1.copy() result[:] = np.nan a = drv.In(mats_1) b = drv.In(mats_2) c = drv.Out(result) start = time.time() complex_mat_mul(a, b, c, block=block, grid=grid ) print(result.real)

給出程式碼的執行結果:

這裡寫圖片描述