完美解決-RuntimeError: CUDA error: device-side assert triggered

阿新 • • 發佈：2020-07-27

網上的解決方案意思是對的，但並沒有給出相應的實際解決方法：

問題描述：

當使用ImageFolder方式構建資料集的時候：

  train_data = torchvision.datasets.ImageFolder(train_path, transform=train_transform)
  train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=6)

pytorch會自己掃描train_path下的每一個資料夾（每類圖片都位於其類別的資料夾下），並將每一個類對映成數值，比如有4類，類別標籤就是[0,1,2,3]。

在進行二分類的時候的確是將標籤對映成了[0,1]，但是在進行4分類的時候，標籤卻對映成了[1,2,3,4]，因此就會報錯：

RuntimeError: CUDA error: device-side assert triggered

我們可以這樣列印下相關的輸出：

from torch.autograd import Variable
#load_fzdataset是自己定義的讀取資料的函式，其返回的是DataLoader物件
train_data,test_data=load_fzdataset(8)
for epoch in range(2):
    for i, data in enumerate(train_data):
         
# 將資料從 train_loader 中讀出來,一次讀取的樣本數是32個
        inputs, labels = data
        # 將這些資料轉換成Variable型別
        inputs, labels = Variable(inputs), Variable(labels)
        # 接下來就是跑模型的環節了，我們這裡使用print來代替
        print("epoch：", epoch, "的第" , i, "個inputs", inputs.data.size(), "labels", labels.data)

報錯時的資訊是：

epoch： 0 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 2, 4, 4, 3, 4, 3, 1])
epoch： 0 的第  
1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 1, 3, 4, 4, 4, 2])
epoch： 0 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 2, 2, 4, 4, 4, 3, 3])
epoch： 0 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 3, 4, 1, 2, 1, 2, 1])
epoch： 0 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 1, 1, 1, 4, 4, 3, 1])
epoch： 0 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 3, 4, 4, 4, 4, 1, 4])
epoch： 0 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 1, 1, 4, 2, 4, 1])
epoch： 0 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 4, 3, 4, 3, 4, 4])
epoch： 0 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([1, 4, 4, 1, 2, 1])
epoch： 1 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 3, 4, 4, 4, 4, 4])
epoch： 1 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([2, 4, 1, 1, 4, 4, 2, 4])
epoch： 1 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 2, 1, 1, 4, 4, 3])
epoch： 1 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 1, 1, 1, 3, 4, 1])
epoch： 1 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 4, 2, 4, 1, 1, 4, 1])
epoch： 1 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 4, 1, 2, 4, 3, 4, 1])
epoch： 1 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 2, 4, 1, 3, 4, 4, 4])
epoch： 1 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 1, 2, 4, 1, 4, 4, 4])
epoch： 1 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([2, 1, 3, 3, 4, 4])

我們只需要這麼修改就行了：

from torch.autograd import Variable
#load_fzdataset是自己定義的讀取資料的函式，其返回的是DataLoader物件
train_data,test_data=load_fzdataset(8)
for epoch in range(2):
    for i, data in enumerate(train_data):
        # 將資料從 train_loader 中讀出來,一次讀取的樣本數是32個
        inputs, labels = data
        # 將這些資料轉換成Variable型別
        inputs, labels = Variable(inputs), Variable(labels)-1
        # 接下來就是跑模型的環節了，我們這裡使用print來代替
        print("epoch：", epoch, "的第" , i, "個inputs", inputs.data.size(), "labels", labels.data)

輸出：

epoch： 0 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 0, 3, 2, 1, 3, 2])
epoch： 0 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 3, 3, 3, 3, 3, 2, 2])
epoch： 0 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 0, 0, 3, 2, 1, 3])
epoch： 0 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 0, 0, 3, 2, 1])
epoch： 0 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([2, 0, 1, 0, 3, 0, 0, 2])
epoch： 0 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 0, 0, 0, 3, 3, 3])
epoch： 0 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 0, 3, 3, 3, 0, 2])
epoch： 0 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 2, 3, 3, 0, 0])
epoch： 0 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([3, 3, 3, 1, 2, 1])
epoch： 1 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 0, 3, 2, 1, 3, 3])
epoch： 1 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 2, 1, 0, 3, 1, 0])
epoch： 1 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 0, 0, 1, 2, 2])
epoch： 1 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 2, 3, 3, 0, 2])
epoch： 1 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 3, 2, 3, 2, 3, 3, 3])
epoch： 1 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 0, 3, 3, 0, 3, 0, 3])
epoch： 1 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 0, 3, 0, 3, 2, 0, 3])
epoch： 1 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 0, 3, 3, 3, 3, 3])
epoch： 1 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([2, 1, 0, 3, 2, 0])

完美解決-RuntimeError: CUDA error: device-side assert triggered

網上的解決方案意思是對的，但並沒有給出相應的實際解決方法：問題描述：

fastai v2 windows執行錯誤解決：RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:247

fastai v2對比v1有極大的改動，釋出的版本主要在linux下執行，並經測試。 windows在執行learn.fine_tune(1)時出錯：

RuntimeError: cuda runtime error (10) : invalid device ordinal at ...

技術標籤：報錯處理cudagpuruntime 出現類似報錯一般有兩種可能模型在伺服器（多GPU）上訓練完成，在自己桌上型電腦上（僅一塊GPU）測試時報錯。這是在load時出了問題。解決可參考 https://blog.csdn.net/yinh

完美解決ERROR 1064 (42000): You have an error in your SQL syntax; check the manual…

MySql在建立資料庫時遇到錯誤提示，ERROR 1064 (42000): You have an error in your SQL syntax; check the manual…

Error java 錯誤不支援發行版本5 ( 完美解決版）

問題在Intellij idea中新建了一個Maven專案，執行時報錯如下：Error : java 不支援發行版本5

解決：RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0； 2.00 GiB total capacity； 1

技術標籤：Problems and Solutions神經網路深度學習pytorch 1. 問題2. 分析3. 解決 1. 問題

paddlepaddle使用多程序報錯“(External) CUDA error(3), initialization error.”的解決方法

將所有與paddle相關的模組都放到 multiprocessing 裡 import 且不要在多程序外有 import 這些模組就可以正常運行了

SQLyog連線MySQL8.0報2058錯誤的完美解決方法

引言用SQLyog連線MySQL8.0(社群版：mysql-installer-community-8.0.15.0.msi)，出現錯誤2058(Plugin caching_sha2_password could not be loaded:xxxx)，通過查詢資料瞭解了該錯誤的原因並在本文中提出了該問題的方

Oracle備庫宕機啟動的完美解決方案

簡介 ORA-10458: standby database requires recovery ORA-01196: 檔案 1 由於介質恢復會話失敗而不一致

完美解決phpstudy安裝後mysql無法啟動（無需刪除原資料庫，無需更改任何配置，無需更改埠）直接共存

　　今天學習php，當然是要先安裝好執行環境了，phpstyudy是一個執行php的整合環境，一鍵安裝對新手很友好，與時作為一個新手，便跟著教程安裝了phpstudy整合環境。

MYSQL5.7.24安裝沒有data目錄和my-default.ini及服務無法啟動的完美解決辦法

mysql官網下載地址：https://dev.mysql.com/downloads/mysql/ 新版安裝包解壓後，沒有網上教程裡面提到的data資料夾和my-default.ini，如下圖所示

完美解決linux上啟動redis後配置檔案未生效的問題

修改redis.conf後，重啟redis，發現修改的配置未生效，原來是需要在啟動redis的時候在命令中加上配置檔案，命令如下

完美解決mysql in條件語句只讀取一條資訊問題的2種方案

今天同事在編寫MYSQL查詢語句時遇到一個很奇怪的問題，使用mysql多表查詢，一個表中的某個欄位作為另一表的in查詢條件，只能讀取一條資訊，而直接用數字的話可以正常讀取

oracle 提示登入密碼過期完美解決方法

oracle 提示登入密碼過期解決 1.登入到oracle的伺服器 2.切換到oracle 使用者 3.設定到當前操作的例項名：export ORACLE_SID=XXX

RedisDesktopManager無法遠端連線Redis的完美解決方法

Linux環境：ubuntu16.04 Redis服務端版本：3.2.6 Redis客戶端下載連結：https://redisdesktop.com/download

MySQL常見記憶體不足啟動失敗的完美解決方法

1.啟動MySQL時一直不成功，檢視錯誤日誌 /var/log/mysql/error.log 2.主要的錯誤資訊有如下幾條:

Mysql5.7中使用group concat函式資料被截斷的問題完美解決方法

前天在生產環境中遇到一個問題：使用 GROUP_CONCAT 函式select出來的資料被截斷了，最長長度不超過1024位元組，開始還以為是navicat客戶端自身對欄位長度做了限制的問題。後面故意重新INSERT了一個欄位長度超1024位元

MongoDB用Mongoose得到的物件不能增加屬性完美解決方法(兩種)

一，先定義了一個goods(商品)的models var mongoose = require(\'mongoose\'); var Schema = mongoose.Schema;

mysql8.0.19忘記密碼的完美解決方法

推薦閱讀：MySQL 8.0.19支援輸入3次錯誤密碼鎖定賬戶功能(例子) 1.開啟cmd視窗(最好以管理員身份開啟)，net stop mysql 停止mysql服務

完美解決pycharm匯入自己寫的py檔案爆紅問題

用pycharm開發時，在匯入自己寫的python檔案時出現模組名爆紅的情況，而且後面每次呼叫檔案裡的函式都沒有沒有提示，必須自己手動輸入，雖然正常使用沒什麼問題，但奈何不了強迫症

完美解決-RuntimeError: CUDA error: device-side assert triggered

相關推薦