CUDA之nvidia-smi命令詳解

阿新 • • 發佈：2018-12-07

nvidia-smi是用來檢視GPU使用情況的。我常用這個命令判斷哪幾塊GPU空閒，但是最近的GPU使用狀態讓我很困惑，於是把nvidia-smi命令顯示的GPU使用表中各個內容的具體含義解釋一下。

這是伺服器上特斯拉K80的資訊。
上面的表格中：
第一欄的Fan：N/A是風扇轉速，從0到100%之間變動，這個速度是計算機期望的風扇轉速，實際情況下如果風扇堵轉，可能打不到顯示的轉速。有的裝置不會返回轉速，因為它不依賴風扇冷卻而是通過其他外設保持低溫（比如我們實驗室的伺服器是常年放在空調房間裡的）。
第二欄的Temp：是溫度，單位攝氏度。
第三欄的Perf：是效能狀態，從P0到P12，P0表示最大效能，P12表示狀態最小效能。
第四欄下方的Pwr：是能耗，上方的Persistence-M：是持續模式的狀態，持續模式雖然耗能大，但是在新的GPU應用啟動時，花費的時間更少，這裡顯示的是off的狀態。
第五欄的Bus-Id是涉及GPU匯流排的東西，domain:bus:device.function
第六欄的Disp.A是Display Active，表示GPU的顯示是否初始化。
第五第六欄下方的Memory Usage是視訊記憶體使用率。
第七欄是浮動的GPU利用率。
第八欄上方是關於ECC的東西。
第八欄下方Compute M是計算模式。
下面一張表示每個程序佔用的視訊記憶體使用率。

視訊記憶體佔用和GPU佔用是兩個不一樣的東西，顯示卡是由GPU和視訊記憶體等組成的，視訊記憶體和GPU的關係有點類似於記憶體和CPU的關係。我跑caffe程式碼的時候視訊記憶體佔得少，GPU佔得多，師弟跑TensorFlow程式碼的時候，視訊記憶體佔得多，GPU佔得少。

背景

[email protected]:~$ nvidia-smi -h

輸出如下資訊：

NVIDIA System Management Interface – v352.79

NVSMI provides monitoring information for Tesla and select Quadro devices.
The data is presented in either a plain text or an XML format, via stdout or a file.
NVSMI also provides several management operations for changing the device state.

Note that the functionality of NVSMI is exposed through the NVML C-based
library. See the NVIDIA developer website for more information about NVML.
Python wrappers to NVML are also available. The output of NVSMI is
not guaranteed to be backwards compatible; NVML and the bindings are backwards
compatible.

http://developer.nvidia.com/nvidia-management-library-nvml/
http://pypi.python.org/pypi/nvidia-ml-py/

Supported products:

Full Support

All Tesla products, starting with the Fermi architecture

All Quadro products, starting with the Fermi architecture

All GRID products, starting with the Kepler architecture

GeForce Titan products, starting with the Kepler architecture

Limited Support

All Geforce products, starting with the Fermi architecture

命令

nvidia-smi [OPTION1 [ARG1]] [OPTION2 [ARG2]] ...

引數

引數	詳解
-h, –help	Print usage information and exit.

LIST OPTIONS:

引數	詳解
-L, –list-gpus	Display a list of GPUs connected to the system.

[email protected]:~$ nvidia-smi -L

GPU 0: GeForce GTX TITAN X (UUID: GPU-xxxxx-xxx-xxxxx-xxx-xxxxxx)

SUMMARY OPTIONS:

引數	詳解
-i,–id=	Target a specific GPU.
-f,–filename=	Log to a specified file, rather than to stdout.
-l,–loop=	Probe until Ctrl+C at specified second interval.

QUERY OPTIONS:

引數	詳解
-q,	–query
-u,–unit	Show unit, rather than GPU, attributes.
-i,–id=	Target a specific GPU or Unit.
-f,–filename=	Log to a specified file, rather than to stdout.
-x,–xml-format	Produce XML output.
–dtd	When showing xml output, embed DTD.
-d,–display=	Display only selected information: MEMORY,
-l, –loop=	Probe until Ctrl+C at specified second interval.
-lms, –loop-ms=	Probe until Ctrl+C at specified millisecond interval.

SELECTIVE QUERY OPTIONS:

引數	詳解	補充
–query-gpu=	Information about GPU.	Call –help-query-gpu for more info.
–query-supported-clocks=	List of supported clocks.	Call –help-query-supported-clocks for more info.
–query-compute-apps=	List of currently active compute processes.	Call –help-query-compute-apps for more info.
–query-accounted-apps=	List of accounted compute processes.	Call –help-query-accounted-apps for more info.
–query-retired-pages=	List of device memory pages that have been retired.	Call –help-query-retired-pages for more info.

[mandatory]

引數	命令
-i, –id=	Target a specific GPU or Unit.
-f, –filename=	Log to a specified file, rather than to stdout.
-l, –loop=	Probe until Ctrl+C at specified second interval.
-lms, –loop-ms=	Probe until Ctrl+C at specified millisecond interval.

DEVICE MODIFICATION OPTIONS:

引數	命令	補充
-pm, –persistence-mode=	Set persistence mode: 0/DISABLED, 1/ENABLED
-e, –ecc-config=	Toggle ECC support: 0/DISABLED, 1/ENABLED
-p, –reset-ecc-errors=	Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE
-c, –compute-mode=	Set MODE for compute applications:	0/DEFAULT,1/EXCLUSIVE_THREAD (deprecated),2/PROHIBITED, 3/EXCLUSIVE_PROCESS
–gom=	Set GPU Operation Mode:	0/ALL_ON, 1/COMPUTE, 2/LOW_DP
-r –gpu-reset	Trigger reset of the GPU.

UNIT MODIFICATION OPTIONS:

引數	命令
-t, –toggle-led=	Set Unit LED state: 0/GREEN, 1/AMBER
-i, –id=	Target a specific Unit.

SHOW DTD OPTIONS:

引數	命令
–dtd	Print device DTD and exit.
-f, –filename=	Log to a specified file, rather than to stdout.
-u, –unit	Show unit, rather than device, DTD.
–debug=	Log encrypted debug information to a specified file.

Process Monitoring:

引數	命令	補充
pmon	Displays process stats in scrolling format.	“nvidia-smi pmon -h” for more information.

TOPOLOGY: (EXPERIMENTAL)

引數	命令	補充
topo	Displays device/system topology. “nvidia-smi topo -h” for more information.	Please see the nvidia-smi(1) manual page for more detailed information.

CUDA之nvidia-smi命令詳解

背景

命令

引數

LIST OPTIONS:

SUMMARY OPTIONS:

QUERY OPTIONS:

SELECTIVE QUERY OPTIONS:

[mandatory]

DEVICE MODIFICATION OPTIONS:

UNIT MODIFICATION OPTIONS:

SHOW DTD OPTIONS:

Process Monitoring:

TOPOLOGY: (EXPERIMENTAL)

CUDA之nvidia-smi命令詳解

GPU狀態監測 nvidia-smi 命令詳解

Linux之檔案查詢命令詳解

linux系列之常用監控命令詳解

【RHEL7/CentOS7防火牆之firewall-cmd命令詳解】

【RHEL7/CentOS7防火墻之firewall-cmd命令詳解】

redis之set 集合命令詳解

CUDA之atomic原子操作詳解

DOCKER命令之docker run 命令詳解

linux之‘cut -f1’命令詳解

程序間通訊之-IPCS/IPCRM命令詳解--linux核心剖析（十二）

linux 之awk命令詳解

【轉】linux之cp/scp命令＋scp命令詳解

linux之cp/scp命令＋scp命令詳解

top命令詳解之深入了解CPU

linux文本處理三劍客之grep命令詳解

Linux之find命令詳解

linux命令詳解之（at）

linux命令詳解之df（6/19）

sqlmap之命令詳解

CUDA之nvidia-smi命令詳解

背景

命令

引數

LIST OPTIONS:

SUMMARY OPTIONS:

QUERY OPTIONS:

SELECTIVE QUERY OPTIONS:

[mandatory]

DEVICE MODIFICATION OPTIONS:

UNIT MODIFICATION OPTIONS:

SHOW DTD OPTIONS:

Process Monitoring:

TOPOLOGY: (EXPERIMENTAL)

相關推薦