Pytorch日常debug記錄

阿新 • • 發佈：2021-01-13

(debug到一半才想起要寫這麼個記錄......）

2021/1/12

1. ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of your array with array.copy().)

這個原因是因為程式中操作的numpy中有使用負索引的情況：image[…, ::-1]。解決辦法比較簡單，加入image這個numpy變數引發了錯誤，返回image.copy()即可。因為copy操作可以在原先的numpy變數中創造一個新的不適用負索引的numpy變數。

所以我在image[…, ::-1]後邊加了個.copy()就可以了。

image = cv2.imread(path)
## image = image[:,:,::-1] 這是出錯時候寫的
image = image[:,:,::-1].copy()  ## 加上.copy()之後就沒有報這個錯誤了

2.RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 100, 100, 3] to have 3 channels, but got 100 channels instead

解決：https://blog.csdn.net/qazwsxrx/article/details/103755834

能看出來是輸入輸出的通道問題，想了想，問題還是出在了上邊的image的讀取。

cv2.imread()匯入圖片時是BGR通道順序,因此，我們通過image[:,:,::-1]將BGR轉換成了RGB，接著加了個.copy()【如上條所示】

具體為什麼會出現這樣的問題我是不知道的，然後就直接對它進行了替換。使用PIL的Image.open(path)替換了cv2.imread()

替換後的程式碼如下：

image = Image.open(path).convert('RGB')

3. 然後又遇上問題了：RuntimeError: Given input size: (512x4x4). Calculated output size: (512x0x0). Output size is too small

解決：https://blog.csdn.net/jsk_learner/article/details/103833034,

這個其實是torchvision的版本問題哈。同樣困擾著我的版本問題是torchvision裡面的transform沒有Resize和ToPILImage

【即：在transforms.Compose([])裡邊】#transforms.ToPILImage(), #transforms.Resize((224, 224)),這倆不能用

所以我就把程式碼放到伺服器上了運行了，這個問題就沒出現了。

4. Runtime Error: CUDA error: out of memory

這個問題主要是放在伺服器上執行的時候程式碼裡面沒有指定使用的cuda而使用預設的GPU（在我這裡是0號）, 預設的GPU正好有其他程式在跑，所以我就直接指定其他的GPU來使用【我沒有在程式碼裡面指定，而是直接在終端指定】。

export CUDA_VISIBLE_DEVICES=1

指定第一塊GPU進行訓練，然後再python train.py

5. TypeError: pic should be Tensor or ndarray. Got <class 'PIL.Image.Image'>

檢查一圈程式碼後發現原先在使用cv2之後的部分程式碼沒有改過來。

我在transform.compose()裡邊使用了transforms.ToPILImage()，所以直接把這句註釋掉了就好了。

6. RuntimeError: CUDA error: device-side assert triggered

說是標籤（label）越界

使用pytorch報錯：RuntimeError: CUDA error: device-side assert triggered裡的CUDA_LAUNCH_BLOCKING=1 python train.py輸出的錯誤是RuntimeError: cuda runtime error (59) : device-side assert triggered

然後找到的解決辦法是：RuntimeError: cuda runtime error (59) : device-side assert triggered

文章中說異常大概是和計算損失值有關，查閱資料時發現很多道友都遇到過這種cuda runtime error(59)，大部分都是索引異常。

然後我特地去程式碼檔案中看了下【這裡因為我是跑別人的程式碼，不是我自己從頭寫到尾的，而資料集和標籤是我自己弄的，就可能跟原作者的程式碼不是特別匹配。】

資料集標籤分為7個類，從0-6，所以我的label標籤是從0到6的，但是我發現原作者程式碼裡面，label的值都進行了減1操作，我估計原作者的label是從1到7的，所以導致0標籤就變成-1，然後就出錯了。

最後我就把這個改過來就可以了。

Pytorch日常debug記錄

2021/1/12

1. ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of your array with array.copy().)

2.RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 100, 100, 3] to have 3 channels, but got 100 channels instead

3. 然後又遇上問題了：RuntimeError: Given input size: (512x4x4). Calculated output size: (512x0x0). Output size is too small

4. Runtime Error: CUDA error: out of memory

5. TypeError: pic should be Tensor or ndarray. Got <class 'PIL.Image.Image'>

6. RuntimeError: CUDA error: device-side assert triggered

最後！終於跑起來了！！！

Pytorch日常debug記錄

pytorch掉坑記錄:model.eval的作用說明

記一次socket.io的debug記錄

MyBatis日常筆記記錄01

Walletry for mac(日常支出記錄軟體)

日常感想記錄——我為何獨寵原神

CUDA、CUDNN以及Pytorch的安裝記錄

【ML】R7-5800H+RTX3060+win11+pytorch+tensorboard配置記錄

記錄一些日常

pytorch 第三方模組 GraphNAS 安裝成功記錄

日常記錄（2020年10月）

莫煩pytorch學習記錄

Gogs搭建記錄與日常使用

Pytorch學習記錄001-Autograd和Backward

PyTorch學習記錄003-Dataset和DataLoader

PyTorch學習記錄004-torchvision

20201207 - 前端業務專案的日常記錄

20201210 - 前端業務專案的日常記錄

2020-12-15日常記錄

HTTP請求解析錯誤的進一步發生將記錄在DEBUG級別。在請求目標中找到無效字元。有效字元在RFC 7230和RFC 3986中定義

Pytorch日常debug記錄

2021/1/12

1. ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of your array with array.copy().)

2.RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 100, 100, 3] to have 3 channels, but got 100 channels instead

3. 然後又遇上問題了：RuntimeError: Given input size: (512x4x4). Calculated output size: (512x0x0). Output size is too small

4. Runtime Error: CUDA error: out of memory

5. TypeError: pic should be Tensor or ndarray. Got <class 'PIL.Image.Image'>

6. RuntimeError: CUDA error: device-side assert triggered

最後！終於跑起來了！！！

相關推薦