tensorflow GPU版本配置加速環境

阿新 • • 發佈：2022-05-11

import tensorflow as tf
tf.test.is_gpu_available()

背景
環境：Anaconda 、tensorflow_gpu==1.4.0 (這裡就用1.4.0版本做演示了，雖然現在的已經是2.0版本了)
如下圖是各個版本的cuda版本資訊，在安裝時需要看清楚，並不是所有的gpu版本都是cuda_8.0
材料：cuda_8.0版本連結：https://pan.baidu.com/s/1lzKSWRLl5lYMrYcLjGbVXw
提取碼：2p9i
安裝cuda
下載之後點選執行cuda

這裡可以選擇安裝的模式：精簡也可以選擇自定義

安裝路徑可以自定義，也可以預設。選擇自定義得記住安裝的路徑（後面配置環境變數）
後面的就是一鍵Next，完成即可

配置系統環境變數

在系統環境變數中配置環境變數，在cuda安裝好時會自動的配置兩個，另外兩個需要自己配置（ps:如果安裝路徑是自定義的話，需要根據情況自行變動）

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\libnvvp

在完成了上述的配置後，可以驗證一下是否配置成功：
在cmd中輸入如下的程式碼：

echo %path%

執行結果如下：

4.配置cudnn:
在分享的安裝包中有一個壓縮包,將其解壓會出現三個資料夾：

將這三個資料夾裡面的檔案對應的複製到cuda檔案下：
（注意這裡是將檔案下的檔案複製到cuda對應的資料夾裡面，而不是將資料夾直接替代cuda下的資料夾（這步特別重要））

4.驗證：
完成上述的所有步驟後，基本上就完成了大部分了！！！
驗證是否成功：
開啟pycharm,在裡面輸入如下測試程式碼：（前提是已經安裝了相應版本tensorflow_gpu,這裡給出1.4.0安裝方法：在cmd中輸入pip install -i

https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==1.4.0）

import ctypes  
import imp  
import sys  
def main():  
    try:  
        import tensorflow as tf  
        print("TensorFlow successfully installed.")  
        if tf.test.is_built_with_cuda():  
            print("The installed version of TensorFlow includes GPU support.")  
        else:  
            print("The installed version of TensorFlow does not include GPU support.")  
        sys.exit(0)  
    except ImportError:  
        print("ERROR: Failed to import the TensorFlow module.")  
  
    candidate_explanation = False  
  
    python_version = sys.version_info.major, sys.version_info.minor  
    print("\n- Python version is %d.%d." % python_version)  
    if not (python_version == (3, 5) or python_version == (3, 6)):  
        candidate_explanation = True  
        print("- The official distribution of TensorFlow for Windows requires "  
              "Python version 3.5 or 3.6.")  
  
    try:  
        _, pathname, _ = imp.find_module("tensorflow")  
        print("\n- TensorFlow is installed at: %s" % pathname)  
    except ImportError:  
        candidate_explanation = False  
        print(""" 
- No module named TensorFlow is installed in this Python environment. You may 
  install it using the command `pip install tensorflow`.""")  
  
    try:  
        msvcp140 = ctypes.WinDLL("msvcp140.dll")  
    except OSError:  
        candidate_explanation = True  
        print(""" 
- Could not load 'msvcp140.dll'. TensorFlow requires that this DLL be 
  installed in a directory that is named in your %PATH% environment 
  variable. You may install this DLL by downloading Microsoft Visual 
  C++ 2015 Redistributable Update 3 from this URL: 
  https://www.microsoft.com/en-us/download/details.aspx?id=53587""")  
  
    try:  
        cudart64_80 = ctypes.WinDLL("cudart64_80.dll")  
    except OSError:  
        candidate_explanation = True  
        print(""" 
- Could not load 'cudart64_80.dll'. The GPU version of TensorFlow 
  requires that this DLL be installed in a directory that is named in 
  your %PATH% environment variable. Download and install CUDA 8.0 from 
  this URL: https://developer.nvidia.com/cuda-toolkit""")  
  
    try:  
        nvcuda = ctypes.WinDLL("nvcuda.dll")  
    except OSError:  
        candidate_explanation = True  
        print(""" 
- Could not load 'nvcuda.dll'. The GPU version of TensorFlow requires that 
  this DLL be installed in a directory that is named in your %PATH% 
  environment variable. Typically it is installed in 'C:\Windows\System32'. 
  If it is not present, ensure that you have a CUDA-capable GPU with the 
  correct driver installed.""")  
  
    cudnn5_found = False  
    try:  
        cudnn5 = ctypes.WinDLL("cudnn64_5.dll")  
        cudnn5_found = True  
    except OSError:  
        candidate_explanation = True  
        print(""" 
- Could not load 'cudnn64_5.dll'. The GPU version of TensorFlow 
  requires that this DLL be installed in a directory that is named in 
  your %PATH% environment variable. Note that installing cuDNN is a 
  separate step from installing CUDA, and it is often found in a 
  different directory from the CUDA DLLs. You may install the 
  necessary DLL by downloading cuDNN 5.1 from this URL: 
  https://developer.nvidia.com/cudnn""")  
  
    cudnn6_found = False  
    try:  
        cudnn = ctypes.WinDLL("cudnn64_6.dll")  
        cudnn6_found = True  
    except OSError:  
        candidate_explanation = True  
  
    if not cudnn5_found or not cudnn6_found:  
        print()  
        if not cudnn5_found and not cudnn6_found:  
            print("- Could not find cuDNN.")  
        elif not cudnn5_found:  
            print("- Could not find cuDNN 5.1.")  
        else:  
            print("- Could not find cuDNN 6.")  
            print(""" 
  The GPU version of TensorFlow requires that the correct cuDNN DLL be installed 
  in a directory that is named in your %PATH% environment variable. Note that 
  installing cuDNN is a separate step from installing CUDA, and it is often 
  found in a different directory from the CUDA DLLs. The correct version of 
  cuDNN depends on your version of TensorFlow: 
 
  * TensorFlow 1.2.1 or earlier requires cuDNN 5.1. ('cudnn64_5.dll') 
  * TensorFlow 1.3 or later requires cuDNN 6. ('cudnn64_6.dll') 
 
  You may install the necessary DLL by downloading cuDNN from this URL: 
  https://developer.nvidia.com/cudnn""")  
  
    if not candidate_explanation:  
        print(""" 
- All required DLLs appear to be present. Please open an issue on the 
  TensorFlow GitHub page: https://github.com/tensorflow/tensorflow/issues""")  
    sys.exit(-1)  
if __name__ == "__main__":  
    main()

如果出現以下結果則表明已經配置成功了：

TensorFlow successfully installed.
The installed version of TensorFlow includes GPU support.

若是出現以下問題則表明環境配置出錯了：

Could not load ‘cudart64_80.dll’. The GPU version of TensorFlow
requires that this DLL be installed in a directory that is named in
your %PATH% environment variable. Download and install CUDA 8.0 from
this URL: https://developer.nvidia.com/cuda-toolkit

5.模型gpu加速訓練：

# 測試tensorflow_gpu版本加速效果程式碼
from datetime import datetime
import math
import time
import tensorflow as tf
import os
#os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
#os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
batch_size = 32
num_batches = 100
# 該函式用來顯示網路每一層的結構，展示tensor的尺寸

def print_activations(t):
print(t.op.name, ' ', t.get_shape().as_list())

# with tf.name_scope('conv1') as scope # 可以將scope之內的variable自動命名為conv1/xxx，便於區分不同元件

def inference(images):
parameters = []
# 第一個卷積層
with tf.name_scope('conv1') as scope:
# 卷積核、截斷正態分佈
kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64],
dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
# 可訓練
biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)
print_activations(conv1)
parameters += [kernel, biases]
# 再加LRN和最大池化層，除了AlexNet，基本放棄了LRN，說是效果不明顯，還會減速？
lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn1')
pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')
print_activations(pool1)
# 第二個卷積層，只有部分引數不同
with tf.name_scope('conv2') as scope:
kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv2)
# 稍微處理一下
lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn2')
pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')
print_activations(pool2)
# 第三個
with tf.name_scope('conv3') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv3 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv3)
# 第四層
with tf.name_scope('conv4') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv4 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv4)
# 第五個
with tf.name_scope('conv5') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv5 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv5)
# 之後還有最大化池層
pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')
print_activations(pool5)
return pool5, parameters
# 全連線層
# 評估每輪計算時間，第一個輸入是tf得Session，第二個是運算運算元，第三個是測試名稱
# 頭幾輪有視訊記憶體載入，cache命中等問題，可以考慮只計算第10次以後的
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10
total_duration = 0.0
total_duration_squared = 0.0
# 進行num_batches+num_steps_burn_in次迭代
# 用time.time()記錄時間，熱身過後，開始顯示時間
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target)
duration = time.time() - start_time
if i >= num_steps_burn_in:
if not i % 10:
print('%s:step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
# 計算每輪迭代品均耗時和標準差sd
mn = total_duration / num_batches
vr = total_duration_squared / num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' % (datetime.now(), info_string, num_batches, mn, sd))
def run_benchmark():
# 首先定義預設的Graph
with tf.Graph().as_default():
# 並不實用ImageNet訓練，知識隨機計算耗時
image_size = 224
images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
pool5, parameters = inference(images)
init = tf.global_variables_initializer()
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))
sess.run(init)
# 下面直接用pool5傳入訓練（沒有全連線層）
# 只是做做樣子，並不是真的計算
time_tensorflow_run(sess, pool5, "Forward")
# 瞎弄的，偽裝
objective = tf.nn.l2_loss(pool5)
grad = tf.gradients(objective, parameters)
time_tensorflow_run(sess, grad, "Forward-backward")
run_benchmark()

好啦，到這裡就大功告成啦~~~~
可以體會gpu給你帶來訓練時的高速了，個人覺得還是得有一塊好的顯示卡，這樣加速效果會更好，速度更快。。。。

6.結束：
有什麼問題和建議歡迎給我發郵件：[email protected]
或者直接聯絡我：1017190168

tensorflow GPU版本配置加速環境

tensorflow GPU版本配置加速環境

安裝tensorflow-gpu版本

windows安裝tensorflow gpu版本

SpringCloud2020版本配置與環境搭建教程詳解

TensorFlow的環境配置與安裝教程詳解（win10+GeForce GTX1060+CUDA 9.0+cuDNN7.3+tensorflow-gpu 1.12.0+python3.5.5）

配置tensorflow-GPU(1.x)環境

Dali工具箱1——torch GPU版本環境配置

win10與Ubuntu16.04雙系統安裝、配置Ubuntu16.04 Anaconda3環境、配置cuda與cudnn、配置TensorFlow-gpu 與 pytorch-gpu、ssh內網穿透

Ubuntu 18.04 配置 tensorflow-gpu 機器學習環境

已安裝tensorflow-gpu,但keras無法使用GPU加速的解決

Visual Studio 2019下配置 CUDA 10.1 + TensorFlow-GPU 1.14.0

Win10下配置tensorflow-gpu的詳細教程（無VS2015/2017）

Win10系統安裝Tensorflow-GPU和VSCode構建Tensorflow開發環境

Window10上Tensorflow的安裝(CPU和GPU版本)

Anaconda Tensorflow GPU 配置

在本地主機使用anaconda3 安裝MindSpore環境——教程(GPU版本)

Win10 下 tensorflow-gpu 2.5 環境搭建

Pytorch以及TensorFlow的GPU版本安裝

除了Ubuntu以外的Linux系統可以安裝Tensorflow/Pytorch的GPU版本嗎？？？

萬惡的環境2——安裝的torch版本是cpu版本如何改為GPU版本

tensorflow GPU版本配置加速環境

相關推薦