RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

阿新 • • 發佈：2020-11-04

問題
環境配置
解決過程
總結

問題

在用pytorch跑生成對抗網路的時候，出現錯誤Runtime Error: one of the variables needed for gradient computation has been modified by an inplace operation，特記錄排坑記錄。

環境配置

windows10 2004
python 3.7.4
pytorch 1.7.0 + cpu

解決過程

嘗試一

這段錯誤程式碼看上去不難理解，意思為：計算梯度所需的某變數已被一就地操作修改。什麼是就地操作呢，舉個例子如x += 1就是典型的就地操作，可將其改為y = x + 1

。但很遺憾，這樣並沒有解決我的問題，這種方法的介紹如下。
在網上搜了很多相關部落格，大多原因如下：

由於0.4.0把Varible和Tensor融合為一個Tensor，inplace操作，之前對Varible能用，但現在對Tensor，就會出錯了。

所以解決方案很簡單：將所有inplace操作轉換為非inplace操作。如將x += 1換為y = x + 1。
仍然有一個問題，即如何找到inplace操作，這裡提供一個小trick：分階段呼叫y.backward()，若報錯，則說明這之前有問題；反之則說明錯誤在該行之後。

嘗試二

在我的程式碼里根本就沒有找到任何inplace操作，因此上面這種方法行不通。自己盯著程式碼，debug，啥也看不出來，好久......
忽然有了新idea。我的訓練階段的程式碼如下：

for epoch in range(1, epochs + 1):
    for idx, (lr, hr) in enumerate(traindata_loader):
        lrs = lr.to(device)
        hrs = hr.to(device)

        # update the discriminator
        netD.zero_grad()
        logits_fake = netD(netG(lrs).detach())
        logits_real = netD(hrs)
        # Label smoothing
        real = (torch.rand(logits_real.size()) * 0.25 + 0.85).clone().detach().to(device)
        fake = (torch.rand(logits_fake.size()) * 0.15).clone().detach().to(device)
        d_loss = bce(logits_real, real) + bce(logits_fake, fake)
        d_loss.backward(retain_graph=True)
        optimizerD.step()

        # update the generator
        netG.zero_grad()
        # ！！！問題出錯行
        g_loss = contentLoss(netG(lrs), hrs) + adversarialLoss(logits_fake)
        g_loss.backward()        
        optimizerG.step()

判別器loss的backward是正常的，生成器loss的backward有問題。觀察到g_loss由兩項組成，所以很自然的想法就是刪掉其中一項看是否正常。結果為：只保留第一項程式正常執行；g_loss中包含第二項程式就出錯。
因此去看了adversarialLoss的程式碼：

class AdversarialLoss(nn.Module):
    def __init__(self):
        super(AdversarialLoss, self).__init__()
        self.bec_loss = nn.BCELoss()

    def forward(self, logits_fake):
        # Adversarial Loss
        # !!! 問題在這，logits_fake加上detach後就可以正常執行
        adversarial_loss = self.bec_loss(logits_fake, torch.ones_like(logits_fake))
        return 0.001 * adversarial_loss

看不出來任何問題，只能挨個試。這裡只有兩個變數：logits_fake和torch.ones_like(logits_fake)。後者為常量，所以試著固定logits_fake，不讓其參與訓練，程式竟能運行了！

class AdversarialLoss(nn.Module):
    def __init__(self):
        super(AdversarialLoss, self).__init__()
        self.bec_loss = nn.BCELoss()

    def forward(self, logits_fake):
        # Adversarial Loss
        # !!! 問題在這，logits_fake加上detach後就可以正常執行
        adversarial_loss = self.bec_loss(logits_fake.detach(), torch.ones_like(logits_fake))
        return 0.001 * adversarial_loss

由此知道了被修改的變數是logits_fake。儘管程式可以運行了，但這樣做不一定合理。類AdversarialLoss中沒有對logits_fake進行修改，所以返回剛才的訓練程式中。

for epoch in range(1, epochs + 1):
    for idx, (lr, hr) in enumerate(traindata_loader):
        lrs = lr.to(device)
        hrs = hr.to(device)

        # update the discriminator
        netD.zero_grad()
        logits_fake = netD(netG(lrs).detach())
        logits_real = netD(hrs)
        # Label smoothing
        real = (torch.rand(logits_real.size()) * 0.25 + 0.85).clone().detach().to(device)
        fake = (torch.rand(logits_fake.size()) * 0.15).clone().detach().to(device)
        d_loss = bce(logits_real, real) + bce(logits_fake, fake)
        d_loss.backward(retain_graph=True)
        # 這裡進行的更新操作
        optimizerD.step()

        # update the generator
        netG.zero_grad()
        # ！！！問題出錯行
        g_loss = contentLoss(netG(lrs), hrs) + adversarialLoss(logits_fake)
        g_loss.backward()        
        optimizerG.step()

注意到Discriminator在出錯行之前進行了更新操作，因此真相呼之欲出————optimizerD.step()對logits_fake進行了修改。直接將其挪到倒數第二行即可，修改後程式碼為：

for epoch in range(1, epochs + 1):
    for idx, (lr, hr) in enumerate(traindata_loader):
        lrs = lr.to(device)
        hrs = hr.to(device)

        # update the discriminator
        netD.zero_grad()
        logits_fake = netD(netG(lrs).detach())
        logits_real = netD(hrs)
        # Label smoothing
        real = (torch.rand(logits_real.size()) * 0.25 + 0.85).clone().detach().to(device)
        fake = (torch.rand(logits_fake.size()) * 0.15).clone().detach().to(device)
        d_loss = bce(logits_real, real) + bce(logits_fake, fake)
        d_loss.backward(retain_graph=True)
        

        # update the generator
        netG.zero_grad()
        g_loss = contentLoss(netG(lrs), hrs) + adversarialLoss(logits_fake)
        g_loss.backward()   
        optimizerD.step()     
        optimizerG.step()

程式終於正常運行了，耶( •̀ ω •́ )y！

總結

原因：在計算生成器網路梯度之前先對判別器進行更新，修改了某些值，導致Generator網路的梯度計算失敗。
解決方法：將Discriminator的更新步驟放到Generator的梯度計算步驟後面。

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

目錄問題環境配置解決過程總結問題在用pytorch跑生成對抗網路的時候，出現錯誤Runtime Error: one of the variables needed for gradient computation has been modified by an inplace operation，特記錄排坑記錄

error: one of the uplinks is down, refuse to publish

技術標籤：程式設計師成長-錯題集npm 問題場景 verdaccio搭建本地npm私庫後，上傳包到私庫報錯： http <-- 503, user: xx, req: ‘PUT /helloworld’, error: one of the uplinks is down, refuse to publish

Consider renaming one of the beans or enabling overriding by setting spring.main.allow-bean-definiti

*************************** APPLICATION FAILED TO START *************************** Description: The bean \'beanNameViewResolver\', defined in class path resource [cn/afterturn/easypoi/configuratio

vue 父子元件傳陣列eslint報錯(Type of the default value for ‘arrNew‘ prop must be a function)

技術標籤：筆記2021javascriptvue 報錯 Type of the default value for ‘arrNew’ prop must be a function. (vue/require-valid-default-prop)

2、idea 啟動專案JDK、JRE報錯：Class JavaLaunchHelper ...One of the two will be used. Which one is undefined.

技術標籤：穀粒學院jdk 1、問題描述：IDEA啟動Java專案後報錯 objc[5811]: Class JavaLaunchHelper is implemented in both/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/bin/java(0x10a0df

Consider marking one of the beans as @Primary, updating the consumer to accept multiple beans, or us

技術標籤：Springjavaspringbeanspring boot大資料 *************************** APPLICATION FAILED TO START

yum報錯：One of the configured repositories failed (CentOS-7 - Addons - sohu.com), and yum doesn't

報錯總結： 1.場景：內網環境，開通上網條件，可以ping通www.baidu.com，但是有yum進行install會報錯

yum安裝程式報錯：One of the configured repositories failed (Unknown),

報錯如下： Loaded plugins: fastestmirror, langpacks Determining fastest mirrors One of the configured repositories failed (Unknown),

the “scope“ attribute for scoped slots have been deprecated and replaced by “slot-scope“

技術標籤：Java專案實戰問題vuevue.jsjavascript前端經驗分享【辰兮要努力】：hello你好我是辰兮，很高興你能來閱讀，暱稱是希望自己能不斷精進，向著優秀程式設計師前行！部落格來源於專案以及程式設計中遇到

Access to XMLHttpRequest at ‘‘ from origin ‘‘ has been blocked by CORS policy: The ‘Access-Con

技術標籤：錯誤整理nginxjava Access to XMLHttpRequest at ‘***’ from origin ‘**’ has been blocked by CORS policy: The ‘Access-Control-Allow-Origin’ header contains multiple values \', *’, but

has been blocked by CORS policy: No ‘Access-Control-Allow-Origin‘ header is present on the requested

技術標籤：vue 發現新建的vue專案也會有如下所示的跨域問題報錯，嘗試新增了 vue.config.js，然後寫入以下程式碼：

Java專案啟動以後服務自己失敗：A fatal error has been detected by the Java Runtime Environment

服務啟動以後過段時間自動失敗錯誤資訊 A fatal error has been detected by the Java Runtime Environment:

Tomcat啟動報A fatal error has been detected by the Java Runtime Environment

## A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00002ba9d88e02da, pid=17844, tid=1076017472

虛擬機器yum出現 one of cinfugured reposotories failed…1.contact the upstream for…2. Reconfigure the baseurl

=> 進入到 /etc/yum.repos.d/ 編輯 vim /CentOS-Base.repo 修改下面標黃的部分（只需註釋掉第一行，取消註釋第二行）完成後再重新執行yum update 等待安裝（時間比較長，耐心等待）然後即可正常執行轉

visual studio (window10) dark主題下修改游標粗細（visual studio change the thickness of the cursor in dark theme for window10）

本人電腦配置：window10系統， Microsoft Visual Studio 2019 本來在visual studio中設定了 dark 的主題，想說使電腦亮度小點，但是發現游標強度太小，經常看不到，既浪費了尋找游標的時間，又不利於眼睛，所以上網

《The Design of a Practical System for Fault-Tolerant Virtual Machines》論文總結

VM-FT 論文總結說明：本文為論文《The Design of a Practical System for Fault-Tolerant Virtual Machines》的個人總結，難免有理解不到位之處，歡迎交流與指正。

《The Design of a Practical System for Fault-Tolerant Virtual Machines》論文研讀

VM-FT 論文研讀說明：本文為論文《The Design of a Practical System for Fault-Tolerant Virtual Machines》的個人理解，難免有理解不到位之處，歡迎交流與指正。

Java Tutorials(the traditional features of the Java, including variables, arrays, data types, operators and control flow)

Language Basics the traditional features of the Java, including variables, arrays, data types, operators and control flow

The understand of modular Multimodal Architecture for Document Classifification

一、Text Extraction the main way: We utilize the open source16 Tesseract OCR engine17 to extract text from all images in the RVL-CDIP dataset.We use the the combined legacy/LSTM engine (oem 3

The Manager application has been re-structured for Tomcat 7 onwards and some of URLs have changed. A

最近寫專案經常碰到這個錯誤最後查詢發現是在tomcat中xml配置請求地址出問題了 ssm框架下，request的對映路徑沒有問題，控制檯也沒有丟擲異常，主頁能正常開啟，但是不能跳轉到Controller的路徑

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

問題

環境配置

解決過程

總結

相關推薦