Cuda learn record two

阿新 • • 發佈：2017-05-31

dsp help != 裏的 launch function eset show ads

這是一個cuda 自帶的算例，包含cuda 計算的一般流程。

這個地址有比較清楚的cuda的介紹。感謝作者分享（http://blog.csdn.net/hjimce/article/details/51506207）

一般來說，cuda 計算的流程是:

1. 設置顯卡編號：cudaSetDevice；這個主要是在有多個GPU的機器上使用，其編號是從0號開始。

2. 為顯卡開辟內存變量： cudaMalloc；使用方法：cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(int));

這裏的指針是指向設備端的內存地址，無法再主機端使用。

3.把主機端的數據拷貝到設備端：cudaMemcpy; 使用方法：

cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);

這裏註意需要指明數據傳輸的地址，

4. 調用內核函數__global__ 類型函數；

cudaAdd<<<blocksPerGrid, threadsPerBlock>>> ( )

這裏blocksPerGrid, threadsPerBlock 都是Dim3型的數據，

5. 把計算結果拷貝到主機端。

6. 釋放顯存空間。

  1 #include "cuda_runtime.h"
  2 
 #include "device_launch_parameters.h"
  3 
  4 #include <stdio.h>
  5 
  6 static void HandleError(cudaError_t err,
  7     const char *file,
  8     int line) {
  9     if (err != cudaSuccess) {
 10         printf("%s in %s at line %d\n", cudaGetErrorString(err),
 11             file, line);
 
 12         exit(EXIT_FAILURE);
 13     }
 14 }
 15 #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
 16 
 17 cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);
 18 void printCudaInformation();
 19 
 20 __global__ void addKernel(int *c, const int *a, const int *b)
 21 {
 22     int i = threadIdx.x;
 23     c[i] = a[i] + b[i];
 24 }
 25 
 26 int main()
 27 {
 28     const int arraySize = 5;
 29     const int a[arraySize] = { 1, 2, 3, 4, 5 };
 30     const int b[arraySize] = { 10, 20, 30, 40, 50 };
 31     int c[arraySize] = { 0 };
 32 
 33     // Add vectors in parallel.
 34     HANDLE_ERROR( addWithCuda(c, a, b, arraySize) );
 35 
 36     printf("{1,2,3,4,5} + {10,20,30,40,50} = {%d,%d,%d,%d,%d}\n",
 37         c[0], c[1], c[2], c[3], c[4]);
 38 
 39     // cudaDeviceReset must be called before exiting in order for profiling and
 40     // tracing tools such as Nsight and Visual Profiler to show complete traces.
 41     HANDLE_ERROR( cudaDeviceReset() );
 42 
 43     system("pause");
 44     printCudaInformation();
 45     system("pause");
 46     return 0;
 47 }
 48 
 49 // Helper function for using CUDA to add vectors in parallel.
 50 cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size)
 51 {
 52     int *dev_a = 0;
 53     int *dev_b = 0;
 54     int *dev_c = 0;
 55     cudaError_t cudaStatus=cudaSuccess;
 56 
 57     // Choose which GPU to run on, change this on a multi-GPU system.
 58     HANDLE_ERROR(cudaSetDevice(0));
 59 
 60     // Allocate GPU buffers for three vectors (two input, one output)   
 61     HANDLE_ERROR(cudaMalloc((void**)&dev_c, size * sizeof(int)));
 62     HANDLE_ERROR(cudaMalloc((void**)&dev_a, size * sizeof(int)));
 63     HANDLE_ERROR(cudaMalloc((void**)&dev_b, size * sizeof(int)));
 64 
 65     // Copy input vectors from host memory to GPU buffers.
 66     HANDLE_ERROR(cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice));
 67     HANDLE_ERROR(cudaMemcpy(dev_b, a, size * sizeof(int), cudaMemcpyHostToDevice));
 68 
 69 
 70     // Launch a kernel on the GPU with one thread for each element.
 71     addKernel<<<1, size>>>(dev_c, dev_a, dev_b);
 72 
 73     // Check for any errors launching the kernel
 74     HANDLE_ERROR(cudaGetLastError());
 75     
 76     // cudaDeviceSynchronize waits for the kernel to finish, and returns
 77     // any errors encountered during the launch.
 78     HANDLE_ERROR(cudaDeviceSynchronize());
 79 
 80     // Copy output vector from GPU buffer to host memory.
 81     HANDLE_ERROR(cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost));
 82     
 83     return cudaStatus;
 84 }
 85 
 86 void printCudaInformation()
 87 {
 88     int count;
 89     cudaGetDeviceCount(&count);
 90     printf("count=%d \n", count);
 91     cudaDeviceProp myProp;
 92     cudaGetDeviceProperties(&myProp, 0);
 93     printf(" --- General Information of My Cuda Device ---\n");
 94     printf("     Device name: %s\n", myProp.name);
 95     printf("     Computer capatibility : %d.%d\n", myProp.major, myProp.minor);
 96     printf("     Clock rate: %d\n", myProp.clockRate);
 97 
 98     printf(" --- Memory Information of My Cuda Device ---\n");
 99     printf("    Total global memory: %ld =%d double \n", myProp.totalGlobalMem, myProp.totalGlobalMem / sizeof(double));
100     printf("    Total const memory: %ld =%d int \n", myProp.totalConstMem, myProp.totalConstMem / sizeof(int));
101     printf("    max memoory pitch: %ld \n", myProp.memPitch);
102 
103     printf(" --- Multiprocessor Information of My Cuda Device ---\n");
104     printf("    multprocessor count= %d\n", myProp.multiProcessorCount);
105     printf("    Shared mem per mp=%d\n", myProp.sharedMemPerBlock);
106     printf("    Registers per mp=%d\n", myProp.regsPerBlock);
107     printf("    Thread in wrap=%d\n", myProp.warpSize);
108     printf("    Max thread per block=%d\n", myProp.maxThreadsPerBlock);
109     printf("    Max threads dimensions= (%d, %d, %d) \n",
110         myProp.maxThreadsDim[0], myProp.maxThreadsDim[1], myProp.maxThreadsDim[2]);
111     printf("    Max Grid dimensions= (%d, %d, %d) \n",
112         myProp.maxGridSize[0], myProp.maxGridSize[1], myProp.maxGridSize[2]);
113     printf("\n");
114 }

Cuda learn record two

dsp help != 裏的 launch function eset show ads 這是一個cuda 自帶的算例，包含cuda 計算的一般流程。這個地址有比較清楚的cuda的介紹。感謝作者分享（http://blog.csdn.net/hjimce/article/

learn opencv-Ubuntu(cuda)上安裝深度學習框架

在帶有CUDA支援的Ubuntu上安裝深度學習框架在本文中，我們將學習如何在具有NVIDIA圖形卡的機器上安裝TensorFlow，Theano，Keras和PyTorch等深度學習框架。如果你有一個全新的顯示卡的計算機，你不知道什麼樣的庫來

關於CUDA兩種API:Runtime API 和 Driver API

ive uda ++ etime bsp con spa runt cuda CUDA 眼下有兩種不同的 API：Runtime API 和 Driver API，兩種 API 各有其適用的範圍。高級API（cuda_runtime.h）是一種C

[LeetCode]160.Intersection of Two Linked Lists

col style return tro nod sts diff original you Intersection of Two Linked Lists Write a program to find the node at which the intersectio

Java多線程編程模式實戰指南（三）：Two-phase Termination模式

增加 row throws mgr 額外 finally join table 還需停止線程是一個目標簡單而實現卻不那麽簡單的任務。首先，Java沒有提供直接的API用於停止線程。此外，停止線程時還有一些額外的細節需要考慮，如待停止的線程處於阻塞（等待鎖）或者等待狀態（等

POJ 1849 Two(遍歷樹)

bold cost spa align div col sizeof 最小 turn POJ 1849 Two(遍歷樹) http://poj.org/problem?id=1849 題意: 有一顆n個結點的帶權的無向樹, 在s結點放兩個機器人,

窩上課不聽，how to learn C language easily(1)

程序簡單小數如果如何好處 class 數組指針 C language 學習心得附：為啥起這麽霸氣側漏，招大神們鄙視的標題，正如我在《C language》隨筆的介紹中寫的，這是一個寫個妹紙們看的C language的文章。沒錯！！寫這篇文章的靈感也來自於上周C

Intersection of Two Arrays

func [] 空間 write example arraylist span ins 數組 Given two arrays, write a function to compute their intersection. Example Given nums1 =

leetcode鏈表--8、merge-two-sorted-list（按順序合並兩個已經排好序的鏈表）

截圖技術鏈表兩個 16px sizeof 一個運行結果截圖 div 題目描述 Merge two sorted linked lists and return it as a new list. The new list should be made by sp

The Languages and Frameworks You Should Learn in 2017

pan end req targe lov dev rapi automatic min Martin Angelov December 8th, 2016 The software development industry continues its relent

[leetcode-583-Delete Operation for Two Strings]

code ngs alt splay ges exc ++ 分享之前 Given two words word1 and word2, find the minimum number of steps required to make word1 and word2 th

Leetcode 21. Merge Two Sorted Lists

oge span solution clas next ini 遞歸解法 else get Merge two sorted linked lists and return it as a new list. The new list should be made by s

LeetCode---------Add Two Numbers 解法

eve n-n pty lead http 順序 number sum represent You are given two non-empty linked lists representing two non-negative integers. The digits

LeetCode 28 Divide Two Integers

範圍 long mod max article edi 優化指數故障 Divide two integers without using multiplication, division and mod operator. 思路：1.先將被除數和除數轉化為long的

ubuntu14.04 + GTX980ti + cuda 8.0 ---Opencv3.1.0配置

install release err idt rim cut fix module b- 狂踩坑，腦袋疼。流程： 1.逛網下載opencv source Opencv3.1.0 zip 2.unzip解壓 3.安裝一堆先決必要的環境： sudo apt-get i

167. Two Sum II - Input array is sorted

-s pointer spa sort for pro exit hang already Problem statement: Given an array of integers that is already sorted in ascending order, f

[LeetCode] Add Two Numbers

std 轉化代碼 single ria 多種解法 ret data -m You are given two linked lists representing two non-negative numbers. The digits are stored in

4. Median of Two Sorted Arrays

中間比較 median log pub math span pan osi 一、Description：　　There are two sorted arrays nums1 and nums2 of size m and n respectively. 　　Find

用scikit-learn學習LDA主題模型

大小 href 房子鏈接 size 目標文本訓練樣本 papers 　　　　在LDA模型原理篇我們總結了LDA主題模型的原理，這裏我們就從應用的角度來使用scikit-learn來學習LDA主題模型。除了scikit-learn, 還有spark MLlib和gen

Cuda learn record two

相關推薦