複數型別矩陣相乘GPU加速--pycuda
阿新 • • 發佈:2019-01-03
複數型別矩陣相乘GPU加速–pycuda
我在用卷積定理做卷積時遇到的一個問題就兩矩陣做完FFT之後都是為複數怎麼用pycuda做矩陣相乘,在給GPU傳遞引數的時候總是有問題。
通過幾天的摸索給出下面程式碼可完成矩陣的相乘,資料型別為complex。其中值得注意的地方就是對於呼叫了from jinja2 import Template
這個編譯器,不然用 sourceModule總是會有錯誤,直接在用#include “complex.h”中的complex 去定義 complex * a 是不可以。
由於C++ 本人沒學過,不好說,pycuda-complex.hpp這個應該是C++的標頭檔案。 Jinja2 模板中執行任意程式碼 ,可以看看這個程式碼 利用 Python 特性在 Jinja2 模板中執行任意程式碼有興趣可以看看。
#-*- coding: utf-8 -*-
import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule
from jinja2 import Template
import numpy as np
KERNEL = Template("""
#include <stdio.h>
#include <pycuda-complex.hpp>
typedef pycuda::complex<float > scmplx;
typedef pycuda::complex<double> dcmplx;
__global__ void complex_mat_mul(const {{complex_type}} *a, const {{complex_type}} *b, {{complex_type}} *res)
{
int row = threadIdx.y;
int col = threadIdx.x;
int mat_id = blockIdx.x * gridDim.x + blockIdx.y;
{{complex_type}} entry = 0;
for (int e = 0; e < {{mat_dim}}; ++e) {
entry += a[mat_id*{{mat_dim}}*{{mat_dim}} + row * {{mat_dim}} + e] * b[mat_id*{{mat_dim}}*{{mat_dim}} + e * {{mat_dim}} + col];
}
res[mat_id*{{mat_dim}}*{{mat_dim}} + row * {{mat_dim}} + col] = entry;
}
""")
data_types = {
'scmplx': np.complex64,
'dcmplx': np.complex128,
'float': np.float32,
'double': np.float64
}
def render_kernel(complex_type, real_type, mat_dim, block, gird):
templ = KERNEL.render(
complex_type=complex_type,
real_type=real_type,
mat_dim=mat_dim,
blockDim_x=block[0],
blockDim_y=block[1]
)
# print(templ)
return templ
complex_type = 'dcmplx'
real_type = 'double'
mat_dim = 4
block = (mat_dim,mat_dim,1)
grid = (1,1)
program = SourceModule(render_kernel(complex_type, real_type, mat_dim, block, grid))
complex_mat_mul = program.get_function("complex_mat_mul")
mats_1 = np.array((
[[1,1,1,0],
[0,1,1,1],
[0,0,1,1],
[0,0,1,1]
]), dtype=np.complex128)
mats_2 = np.array((
[[1,1,1,0],
[0,1,1,1],
[0,0,1,1],
[0,0,1,1]
]), dtype=np.complex128)
result = mats_1.copy()
result[:] = np.nan
a = drv.In(mats_1)
b = drv.In(mats_2)
c = drv.Out(result)
start = time.time()
complex_mat_mul(a, b, c,
block=block,
grid=grid
)
print(result.real)
給出程式碼的執行結果: