win10+anaconda3+tensorflow-gpu一些報錯的解決方法
其實這就是個踩坑記錄
作為一名苦逼研究生苦苦守在windows系統上,只因為當初買電腦的時候搞了一臺無法裝ubuntu的Acer.
之前用過一段時間的TensorFlow(https://www.tensorflow.org/), 好在現在已經有了能在windows上使用的tensorflow,甚至通過conda可以直接安裝,不過這個版本的tensorflow 是cpu版本的。
先前做過對照測試,同一個訓練程式GPU的訓練速度大約是CPU的20倍,20倍!!!,所以在嘗試了一下之後就決定轉戰GPU版本的tensorflow。
已經有先驅在Win10+anaconda3安裝tensorflow-gpu了,
VS2017
對,我用的是vs2017 community,原因是因為現在vs2015屬於舊版本,官網上下載不便(反正我翻了幾次沒找到2015的)。但是開啟CUDA的編譯資料夾的時候你會發現它之支援到2015。最後是退而求其次用vs2013編譯了對應版本的CUDA,順利編譯。
CUDA與cudnn
現在NVIDIA官網上提供的適配於Win10的安裝程式為cuda_8.0.61_win10.exe,和cudnn-8.0-windows10-x64-v6.0
之前那篇教程是將cudnn作為一個附加的元件來安裝的,然而實際上,cudnn是必不可少的,否則將無法在 python中import tensorflow
官網提供的是cudnn 6, 在實際使用的時候,會發現現在 cudnn6還存在一個bug, 導致無法順利import tensorflow。
如果你去查這個“未發現相應模組”的錯誤,你會在StackFLow上發現有人給出的思路清奇的解決方案:將之前cudnn檔案的三個資料夾覆蓋過去後再將C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin下的cudnn64_6.dll改成cudnn64_5.dll。還有一堆人點贊。
別信他,這麼改雖然能讓tensorflow順利import,但是在實際使用CNN的時候依舊會因為版本問題報錯, 正確姿勢是去
載入替換掉原來在C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin下的cudnn64_5.dll後就可以順利在python中使用tensorflow自帶的CNN功能了。
然而,問題並沒有結束
在Tensorflow-gpu使用ImageNet時需要跑個測試, 會發現依舊會報錯。
2017-08-06 11:23:17.058978: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.66GiB
2017-08-06 11:23:17.059122: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-08-06 11:23:17.062956: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0: Y
2017-08-06 11:23:17.065284: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:01:00.0)
2017-08-06 11:23:18.495540: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2017-08-06 11:23:21.602161: E c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2017-08-06 11:23:21.602280: E c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2017-08-06 11:23:21.605768: E c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-06 11:23:21.606460: F c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\kernels\conv_ops.cc:671] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
忽略我的渣顯示卡
這個問題在GitHub上已經摞了很高的一個樓,貌似是Windows特有的,而且和視訊記憶體容量有關。最後有一位宛如救世主的老兄給出了他的總結性發言與變相的解決方案
Here is a bit more info on how I temporarily resolved it. I believe these issues are all related to GPU memory allocation and have nothing to do with the errors being reported. There were other errors before this indicating some sort of memory allocation problem but the program continued to progress, eventually giving the cudnn errors that everyone is getting. The reason I believe it works sometimes is that if you use the gpu for other things besides tensorflow such as your primary display, the available memory fluctuates. Sometimes you can allocate what you need and other times it can’t.
From the API
https://www.tensorflow.org/versions/r0.12/how_tos/using_gpu/
“By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.”I think this default allocation is broken in some way that causes this erratic behavior and certain situations to work and others to fail.
I have resolved this issue by changing the default behavior of TF to allocate a minimum amount of memory and grow as needed as detailed in the webpage.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, …)I have also tried the alternate way and was able to get it to work and fail with experimentally choosing a percentage that worked. In my case it ended up being about .7.
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, …)Still no word from anyone on the TF team confirming this but it is worth a shot to see if others can confirm similar behavior.
就是在建立session的時候修改一下GPU配置,算是跨過了這個問題。
在windows上用tensorflow-gpu估計就是一條坑比的道路,可以預見以後還會有許多問題,先佔個坑,再更。
9月17日更新
在windows平臺上tensorflow這個呼叫GPU的問題是沒法解決了,最近用了Pytorch,完全支援Anaconda3, 並且GPU支援沒有任何問題。在此強烈推薦一下pytorch,深度學習框架界的numpy!
windows的pytorch安裝參見: