CUDA9.0+Cudnn7.0+caffe重新編譯
本來裝好了cuda8.0+cudnn6.0,由於各種原因,不得不改。。。
看到大佬切換自如 https://blog.csdn.net/u010821666/article/details/79957071 ,我還是拋棄8.0吧。。。
(一)下載
https://developer.nvidia.com/rdp/cudnn-archive
下載了cudnn v7.0.5 library for linux
(二)安裝cuda9.0
(1)cuda9.0安裝
在安裝目錄下執行:
sudo sh cuda_9.0.176_384.81_linux.run
因為已經裝過384的英偉達顯示卡驅動了,就不再裝了。
檔案中也出現了cuda-9.0
(2)安裝cudnn
tar -zxvf cudnn-8.0-linux-x64-v6.0.tgz sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
解壓縮,移動到cuda儲存位置
因為我之前安裝cuda8.0時設定過環境變數,所以這裡不必重複,有需要可以檢視我之前的博文:https://blog.csdn.net/uniqueyyc/article/details/81099878
後期添補,可能是我沒有新增連結造成錯誤的,後期我做了這步操作
sudo cp lib* /usr/local/cuda/lib64/ #複製動態連結庫 cd /usr/local/cuda/lib64/sudo rm -rf libcudnn.so libcudnn.so.7 #刪除原有動態檔案 sudo ln -s libcudnn.so.7.0.5 libcudnn.so.7 #生成軟銜接 sudo ln -s libcudnn.so.7 libcudnn.so #生成軟連結 //當然版本號有所不同的話改一下就行了,主要分一下主版本號完整版本號,資料夾中都可以查到。
(3)測試
cuda-9.0的sample目錄下,對任意的sample進行make
sudo make
編譯完後執行例子
./deviceQuery
(4)結果顯示
(三)重新編譯caffe
make all -j16
存在相當多的問題
(1)問題一
解決方案:
刪除Makefile.config中的兩行
(2)問題二
最怕的問題,版本的衝突真的來了。。。
因為我沒有找到我的cuda8.0中的解除安裝指令碼,所以直接刪除cuda-8.0的整個資料夾。雖然知道以後將有無盡的煩惱,之前環境都會崩。。但是無可奈何,大不了從頭再來。原因應該是cuda8.0沒有刪除乾淨、/usr/local/lib裡面的軟連線還是連結到了8.0的。
刪除後重新編譯,不行的話可能需要重新編譯opencv.
刪除之後提示
繼續嘗試 (失敗):https://www.cnblogs.com/fanwendi2312/p/8438575.html
sudo cp opencv-3.2.0/build/lib/libopencv_core.so.3.2 /usr/local/lib/libopencv_core.so.3.2 && sudo ldconfig
還是不行,所以肯定是需要重新編譯opencv了(其實是連結有問題,但是我不怎麼懂,本腦筋重灌)
重新編譯opencv,本想加入opencv_contrib加入一起編譯,發現一直報錯,原因是下載的版本沒有與opencv配套需要重新下載opencv_contrib_3.4.0 下載地址:https://github.com/opencv/opencv_contrib/releases
一天時間一直在重新編譯opencv3.4都出錯,各種錯,暈,然而第二天我懷疑了一下cuda有沒有因為刪除cuda8.0而出問題,重複操作了一下cuda9.0的安裝,然後刪除之前所有的opencv,重新解壓縮,重新編譯,然後居然只有一點點問題,和昨天完全不同,然後居然過了。
參考:
https://blog.csdn.net/u011383131/article/details/79942339
https://www.cnblogs.com/aimhabo/p/8721340.html
https://ricky.moe/2017/08/27/ubuntu-opencv-3-2-0-install/
(3)問題三
後來實在受不了重新裝了系統,一切從頭來過
安裝好opencv3.4以後,Python無法呼叫
Python 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 23:32:55)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named cv2
>>>
解決方案
sudo vim /etc/bash.bashrc
//在最後新增
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig
export PKG_CONFIG_PATH
問題解決
參見博文(8、配置):https://blog.csdn.net/guduruyu/article/details/72965535
解決了之後過了會又不行了
//使用了這個命令
export PYTHONPATH=$PYTHONPATH:/usr/local/lib/python2.7/site-packages
Anaconda安裝參考:https://my.oschina.net/u/2306127/blog/636519
2018.10.25/11:17
此刻真正開始編譯caffe
大多數都是參考如下博文:
https://blog.csdn.net/yhaolpz/article/details/71375762
https://blog.csdn.net/qq_31261509/article/details/78755968 (附帶有如何修改Makefile.config和Makefile)
https://hk.saowen.com/a/2cf482f3327651c21e6df4ca4ddf49e8748ce5932a8da396752b76c2befcc16c
超級多錯誤彙總的博文:https://blog.csdn.net/zziahgf/article/details/72900948
主要是Makefile.config檔案的修改,上述參考的是cuda8.0版本的caffe,這裡是cuda9,稍有不同,
我的Makefile.config檔案:https://download.csdn.net/download/uniqueyyc/10743319。
sudo make clean
sudo make all -j16//後面的數字看自己的配置選擇,不知道的時候也可以不加,上下幾個命令都同理
sudo make test -j16
sudo make runtest -j16
(3)問題三
在執行sudo make runtest時出錯了
.build_release/tools/caffe
.build_release/tools/caffe: error while loading shared libraries: libcudart.so.9.0: cannot open shared object file: No such file or directory
Makefile:542: recipe for target 'runtest' failed
make: *** [runtest] Error 127
解決方案:修改了.bashrc中的配置,之前沒有加上具體的版本號,加上之後就ok了
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-9.0/bin:$PATH
sudo cp /usr/local/cuda-9.0/lib64/libcudart.so.9.0 /usr/local/lib/libcudart.so.9.0 && sudo ldconfig
sudo cp /usr/local/cuda-9.0/lib64/libcublas.so.9.0 /usr/local/lib/libcublas.so.9.0 && sudo ldconfig
sudo cp /usr/local/cuda-9.0/lib64/libcurand.so.9.0 /usr/local/lib/libcurand.so.9.0 && sudo ldconfig
sudo cp /usr/local/cuda-9.0/lib64/libcudnn.so.7 /usr/local/lib/libcudnn.so.7 && sudo ldconfig
成功!
安裝Python介面
// 安裝依賴
sudo apt-get install python-numpy
sudo apt install python-pip
sudo pip install -U scikit-image
//編譯python介面
sudo make pycaffe -j8
問題一
>>> import caffe
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/seugraph/sorftware/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "/home/seugraph/sorftware/caffe/python/caffe/pycaffe.py", line 15, in <module>
import caffe.io
File "/home/seugraph/sorftware/caffe/python/caffe/io.py", line 2, in <module>
import skimage.io
ImportError: No module named skimage.io
解決方案
conda install matplotlib
pip install scikit-image
問題二
>>> import caffe
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/seugraph/sorftware/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "/home/seugraph/sorftware/caffe/python/caffe/pycaffe.py", line 15, in <module>
import caffe.io
File "/home/seugraph/sorftware/caffe/python/caffe/io.py", line 8, in <module>
from caffe.proto import caffe_pb2
File "/home/seugraph/sorftware/caffe/python/caffe/proto/caffe_pb2.py", line 6, in <module>
from google.protobuf.internal import enum_type_wrapper
ImportError: No module named google.protobuf.internal
解決方案:
conda install protobuf
conda install jupyter
問題三
>>> import caffe
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xa
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/seugraph/sorftware/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "/home/seugraph/sorftware/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: numpy.core.multiarray failed to import
>>>
解決方案:
pip uninstall numpy
pip install -U numpy
問題四(warning)
>>> import caffe
/home/seugraph/anaconda2/envs/py2/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
>>>
解決方案:
conda install matplotlib
成功!
CUDNN問題
測試一直錯誤:
./mnistCUDNN
cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : 7005 (7.0.5)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 28 Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 11169, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
CUDNN failure
Error: CUDNN_STATUS_INTERNAL_ERROR
mnistCUDNN.cpp:394
Aborting...
解決
參考:https://www.alatortsev.com/2018/01/17/fixing-cudnn_status_internal_error/
原因是:corrupted cache
解決方案:
sudo rm -rf ~/.nv/
重新測試:
Result of classification: 1 3 5
Test passed!
成功!