caffe配置問題與解決方法集錦

阿新 • • 發佈：2019-01-28

問題1： Check failed: error == cudaSuccess (8 vs. 0) invalid device function

今天看一篇Paper的時候，要用到Facebook基於caffe改動的適用於3D卷積的程式碼：C3D: a modified version of BVLC caffe to support 3D ConvNets。於是就git下來，進行配置，Facebook用的caffe是很早之前的caffe了，看原始碼應該是2014年的。
在配置時，make all -j、make test -j都通過了，唯獨在make runtest -j這裡卡住了，把我這個“專業配置caffe50年”的“老手”都難住了。但經過google，還是找到了解決辦法。
我的這個解決辦法不一定適用於你的，但如果能幫到你，那真是太好了！^_^…

我的問題如下：

Check failed: error == cudaSuccess (8 vs. 0) invalid device function

出現問題，Google之，最後問題定位在Makefile.config中的這一部分：

# CUDA architecture setting: going with all of them (up to CUDA 5.5 compatible).
# For the latest architecture, you need to install CUDA >= 6.0 and uncomment
# the *_50 lines below.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
             -gencode arch=compute_20,code=sm_21 \
             -gencode arch=compute_30,code=sm_30 \
             -gencode arch=compute_35,code=sm_35
             #-gencode=arch=compute_50,code=sm_50 \ 

             #-gencode=arch=compute_50,code=compute_50 \

這是我開始時未改動的Makefile.config中的部分，這種錯誤的情況是由於顯示卡計算能力的不同而又沒配置好導致的。要將上面的CUDA_ARCH引數改為與你顯示卡相匹配的數值。
常見的顯示卡計算能力如下表：

我的是TITAN X計算能力是5.2，因此，我將上面的Makefile.config檔案中的CUDA_ARCH引數改為如下：

CUDA_ARCH := #-gencode arch=compute_20,code=sm_20 \
             #-gencode arch=compute_20,code=sm_21 \ 

             #-gencode arch=compute_30,code=sm_30 \
             #-gencode arch=compute_35,code=sm_35
             #-gencode=arch=compute_50,code=sm_50 \
             #-gencode=arch=compute_50,code=compute_50 \
             -gencode arch=compute_52,code=compute_52

就是把其餘的都註釋掉，增加一行自己顯示卡與之相對應計算能力的設定：

CUDA_ARCH := -gencode arch=compute_52,code=compute_52

再重新編譯caffe，再make runtest -j:

至於YOU HAVE 2 DISABLED TESTS，參見我這篇部落格裡，直接忽略掉，不影響。

問題2 fatal error: pyconfig.h: No such file or directory

緊接著問題1的環境，我在make pycaffe的時候，又報如下錯誤：

/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory
compilation terminated.
make: *** [python/caffe/_caffe.so] Error 1

我解決的方法參考自這個網頁：

所以，按照大神的指示，敲：

$ export CPLUS_INCLUDE_PATH=/usr/include/python2.7

搞定～

之後，import caffe也能成功import.

問題3 cuDNN 版本問題導致在 `make` 時在 `cudnn_conv_layer` 報錯

今天在編譯fast-rcnn的 caffe 時，報如下錯誤：

src/caffe/layers/cudnn_conv_layer.cu: error: argument of type cudnnAddMode_t is incompatible with parameter of type const void *
detected during instantiation of void caffe::CuDNNConvolutionLayer Dtype Forward_gpu(const std vector caffe Blob Dtype *, std allocator caffe Blob Dtype &, const std vector caffe Blob Dtype , std allocator caffe Blob Dtype &) [with Dtype=float]
…………
src/caffe/layers/cudnn_conv_layer.cu: error: argument of type “const void *” is incompatible with parameter of type “cudnnTensorDescriptor_t”
…………
src/caffe/layers/cudnn_conv_layer.cu: error: argument of type “const void *” is incompatible with parameter of type “cudnnTensorDescriptor_t”
…………

20 errors detected in the compilation of “/tmp/tmpxft_000045c5_00000000-16_cudnn_conv_layer.compute_50.cpp1.ii”.

make: * [.build_debug/cuda/src/caffe/layers/cudnn_conv_layer.o] Error 1
make: * Waiting for unfinished jobs….

截圖如下：

這種情況一般是 cuDNN 版本連結問題導致的，要麼升級 cuDNN 的版本，要麼將 cuDNN 的版本進行降級。這裡，我一般要麼是將 cnDNN v2 升級到 cuDNN v4，要麼將 cuDNN v4 降級到 cuDNN v2，。雖說現在 cuDNN 的版本已經到 v5 了，但目前我剛剛說的兩種思路都能解決我遇到的問題。
之後，fast-rcnn編譯成功。

問題4 caffe/ proto/ caffe.pb.h: No such file or directory

這個問題，也是我在編譯 fast-rcnn時遇到的：

In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/layer.hpp:8,
from src/caffe/layer_factory.cpp:3:

./include/caffe/util/cudnn.hpp:8:34: fatal error: caffe/proto/caffe.pb.h: No such file or directory

compilation terminated.

用protoc從caffe/src/caffe/proto/caffe.proto生成caffe.pb.h和caffe.pb.cc

$ protoc --cpp_out=/home/chenxp/caffe/include/caffe/ caffe.proto

這個解決辦法幾乎百試百靈。

問題5 syncedmem.hpp: 18 Check failed: error == cudaSuccess (30 vs. 0)

2016.06.28 更新

今天倩姐說她的torch跑不起來，我看了一下，可能是CUDA出問題了。我又將伺服器上的caffe重新編譯，果然不出所料，遇到的如下問題：

F0628 15:34:16.652927 50205 syncedmem.hpp:18] Check failed: error == cudaSuccess (30 vs. 0) unknown error
* Check failure stack trace: *
@ 0x2ab5de98fdaa (unknown)
@ 0x2ab5de98fce4 (unknown)
@ 0x2ab5de98f6e6 (unknown)
@ 0x2ab5de992687 (unknown)
@ 0x2ab5e0959ef9 caffe::SyncedMemory::mutable_cpu_data()
@ 0x2ab5e0957618 caffe::Blob<>::Reshape()
@ 0x2ab5e0957c7a caffe::Blob<>::Reshape()
@ 0x57643c caffe::MemoryDataLayerTest<>::SetUp()
@ 0x8fa70a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8efd71 testing::Test::Run()
@ 0x8efec7 testing::TestInfo::Run()
@ 0x8f0005 testing::TestCase::Run()
@ 0x8f027d testing::internal::UnitTestImpl::RunAllTests()
@ 0x8fa28a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8ef641 testing::UnitTest::Run()
@ 0x46d027 main
@ 0x2ab5e1933f45 (unknown)
@ 0x4748e9 (unknown)
@ (nil) (unknown)
make: * [runtest] Aborted (core dumped)

我在這裡找到了答案，安裝個東西就可以了：sudo apt-get install nvidia-modprobe

之後，再make runtest -j，搞定！

問題6 undefined reference to imdecode( )

今天給吉姐編譯 Caffe 的時候，碰到如下的錯誤：

.build_release/lib/libcaffe.so: undefined reference to cv::imdecode(cv::_InputArray const&, int)
.build_release/lib/libcaffe.so: undefined reference to cv::imread(cv::String const&, int)

因為昨天我將在伺服器上編譯安裝了 openCV3，所以我懷疑是 openCV 的問題。
想起來 Caffe 的 Makefile.config 中，有一個註釋，當我們使用 openCV3 的時候，需要取消。果然，當取消之後，就可以 make all 了。

問題7 libopencv_core.so.3.1: connot open shared object file: No such file or directory

但是隨後又碰到了一個問題，是在 make runtest -j 的時候，報如下錯誤：

Error while loading libraries: libopencv_core.so.3.1: connot open shared object file: No such file or directory

報這個錯誤是因為找不到 openCV3 的庫，可以使用下面方式匯入：

export LD_LIBRARY_PATH =/usr/local/lib:$LD_LIBRARY_PATH

再 make runtest -j 的時候，就全部成功了。

caffe配置問題與解決方法集錦

問題1： Check failed: error == cudaSuccess (8 vs. 0) invalid device function

問題2 fatal error: pyconfig.h: No such file or directory

問題3 cuDNN 版本問題導致在 `make` 時在 `cudnn_conv_layer` 報錯

問題4 caffe/ proto/ caffe.pb.h: No such file or directory

問題5 syncedmem.hpp: 18 Check failed: error == cudaSuccess (30 vs. 0)

問題6 undefined reference to imdecode( )

問題7 libopencv_core.so.3.1: connot open shared object file: No such file or directory

caffe配置問題與解決方法集錦

在windows下python指令碼訪問Oracle資料庫環境變數配置常見報錯與解決方法集合

OpenPose的安裝配置與實現，以及遇到的問題與解決方法

手把手教~Windows10+Anaconda2(64 bit)+VS2013+無GPU+Caffe配置與遇到的問題及解決

zabbix 監控平臺搭建過程中的報錯與解決方法總結

nmap檢測ms17-010的配置與使用方法

inline-block元素的空隙與解決方法

AppFuse 3常見問題與解決方法

spring定時任務執行兩次的原因與解決方法

YUM 源配置與使用方法

CSS - 移動端常見小bug整理與解決方法總結【更新中】

centos7+VMwareWorkstation創建共享文件夾錯誤解決方法集錦

希捷企業盤ST4000NM0035 V5更新TN04固件遇到的問題與解決方法

java.sql.SQLException: Field 'id' doesn't have a default value（用eclipse操作數據庫時報了這種奇怪的錯誤）的原因與解決方法

weblogic修改jdk版本遇到的問題與解決方法

淺談xss攻擊原理與解決方法

Eslint報錯整理與解決方法（持續整理）

rsync 常見錯誤與解決方法整理

遠端連線區域網內的sql server 無法連線錯誤與解決方法

ShapeDTW程式碼執行問題記錄與解決方法

caffe配置問題與解決方法集錦

問題1： Check failed: error == cudaSuccess (8 vs. 0) invalid device function

問題2 fatal error: pyconfig.h: No such file or directory

問題3 cuDNN 版本問題導致在 make 時在 cudnn_conv_layer 報錯

問題4 caffe/ proto/ caffe.pb.h: No such file or directory

問題5 syncedmem.hpp: 18 Check failed: error == cudaSuccess (30 vs. 0)

問題6 undefined reference to imdecode( )

問題7 libopencv_core.so.3.1: connot open shared object file: No such file or directory

相關推薦

問題3 cuDNN 版本問題導致在 `make` 時在 `cudnn_conv_layer` 報錯