Nvidia CUDA 3.0 更新

阿新 • • 發佈：2019-02-18

- Section 1.2
- Updated figure

添加了說明圖，更好的解釋了CUDA不只是一個語言，而是一個平臺，一個platform，可以在CUDA之上可以架構其他語言平臺，或則程式設計環境。CUDA有自己的ISA架構，有PTX程式碼，所以不要簡單的把CUDA理解為是程式語言，可以根據CUDA的架構開發自己的晶片，或者硬體，當然，這個得要有詳細的CUDA資料才行·~至少現在還不能。。。

- Section 2.5
- Mentioned the Fermi architecture

說明了Fermi是2.x的架構，在他之前的都是1.x的架構。Fermi算是一個進步吧。

- Section 3.1
- Heavily rewritten to clarify binary, ptx, application, C++ compatibility
- __noinline__ behaves differently for compute capability 2.0 and higher

介紹了NVCC和binary，ptx和應用程式，還有C++的關係；CUDA的kernel程式可以用CUDA的指令來寫，這個類似彙編的指令就是PTX，PTX可以從它的手冊裡面找到更詳細的介紹；

3.1.1 部分詳細介紹了nvcc的編譯過程，怎麼把CU檔案或者CUDA的程式編譯成目標檔案，怎麼把C/C++語言的部分提交給C或者C++的編譯器編譯。

3.1.2 說明了二進位制檔案的情況，說明了code代表的意思，說明例如1.3的標示說明這個二進位制的檔案是在1.3的硬體或者之後的硬體上才能執行。

3.1.3 簡單說明了一下PTX的指令一般都可以執行，但是有些指令只能在更高的硬體裝置上才能執行；

3.1.4 說明了不同的版本的二進位制檔案和ptx程式碼，在將來的硬體上執行的情況，當然手冊推薦採用PTX程式碼格式，以後就可以在執行的時候自動轉義過去，這樣就可以適應更新的特性，因為其實現在的一些硬體在編譯一條ptx指令的時候，可能真正的在硬體方面其實使用了更多的指令，因為還不支援原生態的ptx指令，當以後的ptx指令可以一條執行的時候，就會發生變化，所以這個地方提出了說明；

3.1.5

說明了一些支援的C++的特性，不是所有的C++都能支援，可以在後面的附錄中查到；

- Section 3.2
    - Clarified that a CUDA context is created under the hood when initializing
      the runtime and therefore CUDA resources are only valid in the context of
      the host thread that initialized the runtime
    - Updated graphics interoperability sections to new API

說明了現在的CUDA執行的每一個資源都在他的同一個context裡面，這個後面也會說道，一個thread 控制一個GPU執行；

- Section 3.2.1
- Mentioned 40-bit address space for devices of compute capability 2.0

2.0的硬體裝置有了40bit的定址能力；

- Section 3.2.5.3
- Mentioned atomics to mapped page-locked memory

說明了page-locked的記憶體在原子操作跟從host或則其他裝置來講，並不是安全的原子操作；

- Section 3.2.6
- Added concurrent kernel execution and concurrent data transfer for devices
of compute capability 2.0

以前只能一次一次的執行kernel函式，現在可以一次執行多個kernel函式；

- Section 3.3
- Updated graphics interoperability sections to new API

後面部分就是一些新的函式
- New Section 3.4 about interoperability between runtime and driver APIs
- Chapter 4 and 5 mostly rewritten with additional information
- Part of appendix A moved to new appendices G with additional information
- Section B.1.4
    - Mentioned that kernel parameters are passed via constant memory for
      devices of compute capability 2.0
- Section B.6
    - Added new functions __syncthreads_count(), __syncthreads_and(), and
      __syncthreads_or()
- Section B.10
    - Mentioned atomics to mapped page-locked memory
- Section B.11
    - Added new functions __ballot()
- New Section B.12 on profiler counter function
- New Section B.14 on launch bounds
- Section C.1.1
    - Updated error for some functions
    - Updated based FMAD being fused for compute capability 2.0
- Section C.1.2
    - atomicAdd works with single-precision floating-point numbers for devices
      of compute capability 2.0
    - Updated error for some functions
- Section C.2.1
    - Added new functions
- Section C.2.2
    - Added new functions
- New Section D.6 about classes with non virtual member functions for devices
of compute capability 2.0
- New appendix E for nvcc specifics (moved __noinline__, #pragma unroll to this
      appendix and added __restrict)

註解：

3.0的更新期待一些新特性，但是總體變化不大，倒是3.0的guide比較不錯，可以好好的坎坷chapter3，裡面有很多很詳細的講解，有時間可以多看看那一部分。

PS：看了VS2010的廣告，不禁感嘆，誰又會是我的下一行code啦……

Nvidia CUDA 3.0 更新

Nvidia CUDA 3.0 更新

nvidia cuda工具包更新9.0版本記錄

免費解決Android studio 3.0更新後搜狗輸入法卡頓問題

全開源C++ DirectUI 介面庫SOUI 3.0更新

mybatis-dynamic-query 3.0 更新

asp.net core 3.0 更新簡記

深度學習服務器環境配置: Ubuntu17.04+Nvidia GTX 1080+CUDA 9.0+cuDNN 7.0+TensorFlow 1.3

全新的閃念膠囊，OneStep 1.5 以及 BigBang 2.0 更新後的 Smartisan OS 3.6 體驗

更新Android Studio 3.0碰到的問題

as更新3.0.1的時候的編譯異常

Ubuntu 下安裝CUDA（安裝：NVIDIA-384+CUDA9.0+cuDNN7.1）

windows補丁更新服務--WSUS 3.0部署

windows10 64位下tensorflow 3.6+cuda 9.0 +cudnn 9.0安裝過程與踩過的雷

ubuntu下安裝pyenv+anaconda3-5.3.0+cuda

{專業親測，一次就好}Ubuntu16.04+Cuda.9.0+cudnn.7.1+tensorflow-gpu+opencv.3安裝步驟

windows 8 + Tensorflow 1.10.0 + Python 3.6.4 + CUDA 9.0 + CUDNN7.3.0配置

原始碼安裝cuDNN v7.3.0 for CUDA 10.0，並測試cuDNN是否安裝成功

安裝CUDA 9.0時，NIVDIA安裝程式失敗，解決方法。win10+vs2017+python 3.6+cuda 9.0+cudnn 7.0+tensorflow 1.5

Fundebug前端JavaScript外掛更新至1.3.0

Pip 更新Spyder至3.3.0時出現的問題

Nvidia CUDA 3.0 更新

相關推薦