系統安裝情況以及深度學習環境搭建
1.戴爾AL安裝Ubuntu16.04問題總結
1).找不到固態硬碟
由於dell電腦的出廠設定,在BIOS裡面都統一把硬碟模式設為RAID ON,但這種模式下可能會導致不能正確識別或者完全發揮處SSD的效能。下面是把RAID模式更改位AHCI的方法。
進入wins之後,按下WIN鍵+R鍵,輸入msconfig,進入如下引導介面,安全引導打鉤,最小打鉤,如下所示
之後,點選重新啟動;在啟動之後,按下F2鍵進入BIOS依次找到Advanced介面,選中SATA operation,並按下回車鍵,選擇AHCI模式,這裡提示要重新裝系統,不用理會,點選YES即可,然後按F10,選擇YES,重啟電腦。重啟之後,進入windows的安全模式,再次按下WIN鍵和R鍵,並輸入msconfig,在引導介面,把之前的安全引導的勾全部去掉,
然後點選下面的確定,最後選擇重新啟動。開機成功,證明我們開啟了AHCI模式。
2)觸控式螢幕驅動不對
sudo su echo 'blacklist i2c_hid' >> /etc/modprobe.d/blacklist.conf depmod -a update-initramfs -u and reboot
3)黑屏
安裝完ubuntu16.04之後,可能會出現黑屏的現象,解決方法:
一、
- 開機在系統選擇時按”e”進入grub的編輯模式
- 找到“quite splash”並在後面加上對nvidia顯示卡的驅動支援”nomodeset”
- 按 Ctrl+X或F10啟動系統
- 以管理員許可權編輯/etc/default/grub
- 找到GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash”,修改為:GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash nomodeset”
- 更新grub:sudo update-grub,並重新開機
二、安裝完系統後,可能會進入系統,進入之後執行如下
sudo nano /etc/default/grub
找到這一行:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"修改為GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"
貌似Ctrl+o, ctrl +x後(具體看下面提示)更新GRUB,輸入:sudo update-grub
環境搭建
1.安裝依賴包
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler sudo apt-get install --no-install-recommends libboost-all-dev sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev sudo apt-get install git cmake build-essential
2.安裝顯示卡驅動
由於16.04預設安裝的是nouveau顯示卡驅動,而它不能用於CUDA的,需要解除安裝並重新安裝
1)首先禁用Ubuntu16.04自帶的顯示卡驅動nouveau,禁用方法就是在 /etc/modprobe.d/blacklist-nouveau.conf 檔案中新增一條禁用命令,如下
sudo gedit /etc/modprobe.d/blacklist-nouveau.conf
開啟後發現該檔案中沒有任何內容,寫入:
blacklist nouveau options nouveau modeset=0
儲存後關閉檔案,注意此時還需執行以下命令使禁用 nouveau 真正生效:
sudo update-initramfs -u
檢測禁用生效了沒,使用如下
lsmod | grep nouveau
下面就開始重灌顯示卡驅動:
我的驅動下載的是NVIDIA_Linux-x86_64-415.13.run,放到自己的使用者名稱home目錄下
下面進入文字模式,ctrl+alt+f1,在文字模式下關閉桌面服務:sudo service lightdm stop,(如果要下載之前安裝的英偉達驅動可以使用sudo apt-get purge nvidia* ),進入到存放驅動的目錄下,執行如下命令:
sudo sh NVIDIA_Linux-x86_64-415.13.run --no-opengl-libs #run檔案的檔名根據自己下的檔名修改,預設是我提供的檔案
期間出現如下:
- Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?
- (y)es/(n)o/(q)uit: y
- do you want to run nvidia-xconfig?
- (y)es/(n)o/(q)uit: n
- Install the CUDA 9.1 Samples?
- (y)es/(n)o/(q)uit: n
- Install the CUDA 9.1 Toolkit?
- (y)es/(n)o/(q)uit: n
然後重新啟動系統reboot就可以了,在此驅動安裝完畢。使用如下命令nvidia-settings和nvidia-smi
來驗證。
下面安裝cuda10(通過命令nvidia-smi來檢視到的),下載之,名字叫cuda_10.0.130_410.48_linux.run。
執行如下
sudo sh cuda_9.1.85_387.26_linux.run --no-opengl-libs
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26? (y)es/(n)o/(q)uit: n Install the CUDA 9.1 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-9.1 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 9.1 Samples? (y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /home/ccem ]: Installing the CUDA Toolkit in /usr/local/cuda-9.1 ... Installing the CUDA Samples in /home/ccem ... Copying samples to /home/ccem/NVIDIA_CUDA-9.1_Samples now... Finished copying samples. =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-9.1 Samples: Installed in /home/ccem Please make sure that - PATH includes /usr/local/cuda-9.1/bin - LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.1/doc/pdf for detailed information on setting up CUDA. ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.1 functionality to work. To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run -silent -driver Logfile is /tmp/cuda_install_36731.log
如果出現如下,則說明缺少依賴庫
Installing the CUDA Toolkit in /usr/local/cuda-9.1 … Missing recommended library: libGLU.so Missing recommended library: libX11.so Missing recommended library: libXi.so Missing recommended library: libXmu.so 則對應安裝依賴庫 sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
安裝完後,配置cuda的環境變數下面是為當前使用者配置
sudo gedit ~/.bashrc export PATH=/usr/local/cuda/bin:$PATH #/usr/local/cuda和/usr/local/cuda-10.0是同一個資料夾,前者是後者的映象 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source ~/.bashrc使之生效;下面是為所有使用者配置環境變數
$ sudo vim /etc/profile
export PATH=/usr/local/cuda/bin:${PATH} # 必須
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH} # 非必須,可以用前面介紹的方式
檢驗CUDA 是否安裝成功,輸入:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery sudo make ./deviceQuery
下面是安裝cuDNN v7,我下載的版本是cudnn-10.0-linux-x64-v7.4.1.5.tgz。把他解壓到任何路徑,我的解壓路徑位/usr/local/cudnn下面,解壓後的資料夾名為cuda,資料夾中包含兩個資料夾:一個為include,另一個為lib64。將解壓後的檔案中的lib64資料夾關聯到環境變數中。這一步很重要,sudo gedit ~/.bashrc,輸入如下內容
export LD_LIBRARY_PATH=/your/path/to/cudnn/lib64:$LD_LIBRARY_PATH
其中/your/path/to/cudnn/lib64是指.tgz解壓後的檔案所在路徑中的lib64資料夾。儲存,退出並source一下,再重啟一下Terminal(終端),該步驟可以成功的配置cuDNN的Lib檔案,配置cuDNN的最後一步就是將解壓後的cuDNN資料夾(一般該檔名為cuda)中的include資料夾(/your/path/to/cudnn/include)中的cudnn.h檔案拷貝到/usr/local/cuda/include中,由於進入了系統路徑,因此執行該操作時需要獲取管理員許可權。
cd cuda/include
sudo cp *.h /usr/local/cuda/include/
之後,再重置cudnn.h檔案的讀寫許可權: sudo chmod a+r /usr/local/cuda/include/cudnn.h,至此,cuDNN的配置就全部安裝完成了。
下面安裝tensorflow,我選擇的原始碼安裝方式,參考https://github.com/jikexueyuanwiki/tensorflow-zh/blob/master/SOURCE/get_started/os_setup.md以及https://blog.csdn.net/a446712385/article/details/79149977
在終端輸入以下命令:
$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
–recurse-submodules 引數是必須得, 用於獲取 TesorFlow 依賴的 protobuf 庫.放入home目錄下,下面下載Bazel並安裝之
下載的名字為bazel-0.15.2-installer-linux-x86_64.sh
安裝其他依賴:
sudo apt-get update
sudo apt-get install python-pip python-numpy swig python-dev python-wheel sudo apt-get install pkg-config zip g++ zlib1g-dev unzip
sudo apt-get install default-jdk
//For Python 2.7:
sudo apt-get install python-numpy swig python-dev python-wheel//For Python 3.x:
$ sudo apt-get install python3-numpy swig python3-dev python3-wheel
在這裡使用python3.
export PATH=/usr/bin:$PATH,這是python環境變數的配置
./bazel-0.3.2-installer-linux-x86_64.sh --user
將執行路徑output/bazel 新增到$PATH環境變數後bazel工具就可以使用了,環境變數配置
~/.bashrc下面輸入
export PATH=$HOME/bin:$PATH
下面去配置tensorflow,
進入到它的資料夾下面,執行./configure
這部分是配置tensorflow,然後再生成whl安裝tensorflow。
直接pip安裝就是安裝官網提供的已經配置好的whl包,而原始碼安裝就是利用bazel編譯後,生成whl包,再進行安裝。
(如果是需要開啟GPU,在這裡需要配置cuda和cudnn。因為電腦顯示卡計算能力不夠不能開啟GPU,所以之前沒有安裝cuda和cudnn)
1)配置
You have bazel 0.17.2 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.5 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n No Apache Ignite support will be enabled for TensorFlow. Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: n No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: n No TensorRT support will be enabled for TensorFlow. Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. Configuration finishedView Code
上面的部分程式碼是參考https://www.cnblogs.com/seniusen/p/9756302.html
以上在配置的過程中可能會出錯,在這裡我把系統預設的Python2改為了python3.5,使用方法如下
備份原來的python2軟連結,sudo mv /usr/bin/python /usr/bin/python.2-bak,然後執行ln -s /usr/local/bin/python3.5 /usr/bin/python,使用python --version測試成功,但是在編譯tensorflow的時候會出現一些問題,NO module named keras.preprocessing,解決方法sudo pip install keras,但是又出現其他的錯誤ModuleNotFoundError: No module named 'pip._internal',解決方法
wget https://bootstrap.pypa.io/get-pip.py --no-check-certificate sudo python get-pip.py
然後測試,pip -V
,即可解決。
下面進行編譯
在tensorflow目錄下,輸入以下三個命令:
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
編譯很久,結束之後,執行
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
tmp/tensorflow_pkg目錄下找到(whl包的名字可能不一樣,和電腦屬性或者當前tensorflow版本之類的有關),我的名字為tensorflow-1.12.0rc0-cp35-cp35m-linux_x86_64.whl
將其複製到主資料夾,以便安裝
sudo pip install tensorflow-1.12.0rc0-cp35-cp35m-linux_x86_64.whl
等待安裝完成後,輸入以下命令,不報錯即安裝成功.
測試是否安裝成功
python #這裡會輸出python的版本資訊,見下圖 >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() #這裡會輸出GPU的相關資訊,表明TensorFlow是在GPU上執行的,見下圖 >>> sess.run(hello) b'Hello, TensorFlow!' >>> a = tf.constant(10) >>> b = tf.constant(22) >>> sess.run(a+b) 32 >>>
以下是tensorflow c++的介面設定https://www.cnblogs.com/seniusen/p/9756302.html