Getting the GPU usage of NVIDIA cards with the Linux dstat tool
The dstat is an awesome little tool which allows you to get resource statistics for your Linux box. It has a modular architecture which allows you to develop additional plugins and itโs easy to use. Recently I was profiling a Deep Learning pipeline developed with Keras and Tensorflow and I needed detailed statistics about the CPU, Hard Disk and GPU usage. The first two are available out-of-the-box by dstat, nevertheless as far as I know there is no plugin for monitoring GPU usage for NVIDIA graphics cards.
Thankfully it is super easy to write a python plugin for dstat. I have already sent a pull-request on the official repo but since new versions are released relatively rarely here are some instructions on how to set up the dstat NVIDIA GPU usage plugin on your box.
Installation
The following commands are tested on Ubuntu 16.04 and they will help you install dstat, the Python NVIDIA Management Library and my dstat nvidia plugin:
sudo apt-get install dstat #install dstat sudo pip install nvidia-ml-py #install Python NVIDIA Management Library wget https://raw.githubusercontent.com/datumbox/dstat/master/plugins/dstat_nvidia_gpu.py sudo mv dstat_nvidia_gpu.py /usr/share/dstat/ #move file to the plugins directory of dstat
To get all the default statistics along with GPU usage (percentage) type the following command:
dstat -a --nvidia-gpu ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- gpu-u usr sys idl wai hiq siq| read writ| recv send| in out | int csw |total 2 1 96 0 0 0|5816k 15M| 0 0 | 0 0 | 45k 98k| 68 0 1 98 0 0 0| 57M 128k| 104B 902B| 0 0 | 42k 85k| 50 8 7 84 1 0 0| 152M 0 | 292B 448B| 0 0 | 52k 93k| 39 1 1 97 1 0 0| 111M 0 | 52B 374B| 0 0 | 51k 116k| 62 0 1 98 1 0 0| 129M 0 | 80B 416B| 0 0 | 43k 85k| 92 0 2 98 0 0 0| 0 0 | 52B 374B| 0 0 | 41k 83k| 81
To get all the usage statistics for each GPU use the following command:
dstat --nvidia-gpu -f -------------------------------------------gpu-usage-nvidia------------------------------------------ total gpu0 gpu1 gpu2 gpu3 gpu4 gpu5 gpu6 gpu7 gpu8 gpu9 gpu10 gpu11 gpu12 gpu13 gpu14 gpu15 19 23 22 21 21 20 22 23 25 15 18 16 16 16 18 16 14 18 21 20 18 22 21 21 22 21 15 15 14 14 14 15 16 13 10 14 9 13 8 9 11 9 12 9 9 10 10 8 7 9 9 18 20 22 19 21 20 21 21 22 14 15 14 15 14 15 15 15 20 24 22 23 24 25 22 22 22 16 16 16 16 16 16 18 16 15 21 18 19 18 17 17 16 18 14 13 13 14 13 12 11 11 20 24 22 22 24 25 23 24 22 16 18 16 14 17 17 17 15 19 29 18 23 21 22 21 20 21 18 16 16 18 14 14 17 17
How it works
The plugin fetches the number of available GPUs on the system and samples 10 times the usage metric for each GPU. Sampling multiple times will hopefully return smoother metrics than getting a single measurement. After that it averages the usage across all GPUs and returns the results to the user. The source code of the plugin is available here.
Hope you enjoy it, happy GPU programming! ๐