Pytorch學習（十）---解讀Neural Style程式碼

阿新 • • 發佈：2019-01-30

總說

其實之前寫過的torch版本的neural style程式碼的解讀，可以參考
Torch7學習（七）——Neural-Style程式碼解析，不過那是傳統的層的思想的框架，如今都是計算圖的思想了。pytorch版本的寫法與之前的寫法還是有一定差異的，主要是簡單了很多！對比之後你會震撼的。

pytorch官網的neural style程式碼

其他沒啥好看的，主要看核心程式碼：

class ContentLoss(nn.Module):

    def __init__(self, target, weight):
        super(ContentLoss, self).__init__()
        # we 'detach' the target content from the tree used 

        self.target = target.detach() * weight
        # to dynamically compute the gradient: this is a stated value,
        # not a variable. Otherwise the forward method of the criterion
        # will throw an error.
        self.weight = weight
        self.criterion = nn.MSELoss()

    def forward 
(self, input):
        self.loss = self.criterion(input * self.weight, self.target)
        self.output = input
        return self.output

    def backward(self, retain_graph=True):
        self.loss.backward(retain_graph=retain_graph)
        return self.loss

class GramMatrix(nn.Module):

    def 
 forward(self, input):
        a, b, c, d = input.size()  # a=batch size(=1)
        # b=number of feature maps
        # (c,d)=dimensions of a f. map (N=c*d)

        features = input.view(a * b, c * d)  # resise F_XL into \hat F_XL

        G = torch.mm(features, features.t())  # compute the gram product

        # we 'normalize' the values of the gram matrix
        # by dividing by the number of element in each feature maps.
        return G.div(a * b * c * d)

class StyleLoss(nn.Module):

    def __init__(self, target, weight):
        super(StyleLoss, self).__init__()
        self.target = target.detach() * weight
        self.weight = weight
        self.gram = GramMatrix()
        self.criterion = nn.MSELoss()

    def forward(self, input):
        self.output = input.clone()
        self.G = self.gram(input)
        self.G.mul_(self.weight)
        self.loss = self.criterion(self.G, self.target)
        return self.output

    def backward(self, retain_graph=True):
        self.loss.backward(retain_graph=retain_graph)
        return self.loss

上面部分，我們發現比較詭異的地方：
1. 在Pytorch入門學習（八）—–自定義層的實現（甚至不可導operation的backward寫法）中，我們知道如果要擴充套件自定義層，只需要過載nn.Module的forward函式就行。然後在forward裡面呼叫xxxFunction.apply來呼叫自定義autograd類，在自定義autograd類中，再進行過載forward以及backward。但是這裡為什麼在自定義module中就有backward?
2. retain_graph有什麼用？
3. Gram矩陣直接寫forward就行，根本不用寫反向傳播，這個是真厲害！

對比torch的Gram寫法

local Gram, parent = torch.class('nn.GramMatrix', 'nn.Module')

function Gram:__init()
  parent.__init(self)
end

function Gram:updateOutput(input)
  assert(input:dim() == 3)
  local C, H, W = input:size(1), input:size(2), input:size(3)
  local x_flat = input:view(C, H * W)
  self.output:resize(C, C)
  self.output:mm(x_flat, x_flat:t())
  return self.output
end

function Gram:updateGradInput(input, gradOutput)
  assert(input:dim() == 3 and input:size(1))
  local C, H, W = input:size(1), input:size(2), input:size(3)
  local x_flat = input:view(C, H * W)
  self.gradInput:resize(C, H * W):mm(gradOutput, x_flat)
  self.gradInput:addmm(gradOutput:t(), x_flat)
  self.gradInput = self.gradInput:view(C, H, W)
  return self.gradInput
end


-- Define an nn Module to compute style loss in-place
local StyleLoss, parent = torch.class('nn.StyleLoss', 'nn.Module')

function StyleLoss:__init(strength, normalize)
  parent.__init(self)
  self.normalize = normalize or false
  self.strength = strength
  self.target = torch.Tensor()
  self.mode = 'none'
  self.loss = 0

  self.gram = nn.GramMatrix()
  self.blend_weight = nil
  self.G = nil
  self.crit = nn.MSECriterion()
end

function StyleLoss:updateOutput(input)
  self.G = self.gram:forward(input)
  self.G:div(input:nElement())
  if self.mode == 'capture' then
    if self.blend_weight == nil then
      self.target:resizeAs(self.G):copy(self.G)
    elseif self.target:nElement() == 0 then
      self.target:resizeAs(self.G):copy(self.G):mul(self.blend_weight)
    else
      self.target:add(self.blend_weight, self.G)
    end
  elseif self.mode == 'loss' then
    self.loss = self.strength * self.crit:forward(self.G, self.target)
  end
  self.output = input
  return self.output
end

function StyleLoss:updateGradInput(input, gradOutput)
  if self.mode == 'loss' then
    local dG = self.crit:backward(self.G, self.target)
    dG:div(input:nElement())
    self.gradInput = self.gram:backward(input, dG)
    if self.normalize then
      self.gradInput:div(torch.norm(self.gradInput, 1) + 1e-8)
    end
    self.gradInput:mul(self.strength)
    self.gradInput:add(gradOutput)
  else
    self.gradInput = gradOutput
  end
  return self.gradInput
end

前後程式碼量起碼差一倍以上，而且書寫難度後者困難很多！最主要的差別就是torch對於自定義層，得自己寫updateGradInput啊！特別是gram的反向傳播，可能並不是很容易寫的。另外一點是StyleLoss的反向傳播self.gradInput:add(gradOutput)也是有點理解費勁。這點其實在隱藏層加監督（feature matching）的程式碼書寫方法—- 附加optim包的功能再看。有了相應的說明。然而對於這種自動求導的框架，只要是你forward是完全用Variable進行計算的，那麼就會建立一個正確的graph，那麼反向就會正確！就是這麼crazy！根本不用寫反向傳播！

Neural Style的自定義module的backward是什麼？

其實這個“backward”知識一個普通的函式！並不是過載內部的backward！
在這個“backward”中主要是呼叫criterion的backward，然後返回這個loss，好讓外面能拿到相應的損失。
其實看到後面就可以發現。

竟然可以直接將自定義層放入一個list中！

# desired depth layers to compute style/content losses :
content_layers_default = ['conv_4']
style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']


def get_style_model_and_losses(cnn, style_img, content_img,
                               style_weight=1000, content_weight=1,
                               content_layers=content_layers_default,
                               style_layers=style_layers_default):
    cnn = copy.deepcopy(cnn)

    # just in order to have an iterable access to or list of content/syle
    # losses（放入自定義層的list）
    content_losses = []
    style_losses = []

    model = nn.Sequential()  # the new Sequential module network
    gram = GramMatrix()  # we need a gram module in order to compute style targets

    # move these modules to the GPU if possible:
    if use_cuda:
        model = model.cuda()
        gram = gram.cuda()

    i = 1
    for layer in list(cnn):
        if isinstance(layer, nn.Conv2d):
            name = "conv_" + str(i)
            model.add_module(name, layer)

            if name in content_layers:
                # add content loss:
                target = model(content_img).clone()
                content_loss = ContentLoss(target, content_weight)
                model.add_module("content_loss_" + str(i), content_loss)
                content_losses.append(content_loss)

            if name in style_layers:
                # add style loss:
                target_feature = model(style_img).clone()
                target_feature_gram = gram(target_feature)
                style_loss = StyleLoss(target_feature_gram, style_weight)
                model.add_module("style_loss_" + str(i), style_loss)
                style_losses.append(style_loss)

        if isinstance(layer, nn.ReLU):
            name = "relu_" + str(i)
            model.add_module(name, layer)

            if name in content_layers:
                # add content loss:
                target = model(content_img).clone()
                content_loss = ContentLoss(target, content_weight)
                model.add_module("content_loss_" + str(i), content_loss)
                content_losses.append(content_loss)

            if name in style_layers:
                # add style loss:
                target_feature = model(style_img).clone()
                target_feature_gram = gram(target_feature)
                style_loss = StyleLoss(target_feature_gram, style_weight)
                model.add_module("style_loss_" + str(i), style_loss)
                style_losses.append(style_loss)

            i += 1

        if isinstance(layer, nn.MaxPool2d):
            name = "pool_" + str(i)
            model.add_module(name, layer)  # ***

    # 將整個模型，以及損失層的list全部返回
    return model, style_losses, content_losses

再之後

# Parameter是需要grad的Variable！
# 這裡是迭代優化輸入x，因此將x作為引數，然後放進optim中。
def get_input_param_optimizer(input_img):
    # this line to show that input is a parameter that requires a gradient
    input_param = nn.Parameter(input_img.data)
    optimizer = optim.LBFGS([input_param])
    return input_param, optimizer

def run_style_transfer(cnn, content_img, style_img, input_img, num_steps=300,
                       style_weight=1000, content_weight=1):
    """Run the style transfer."""
    print('Building the style transfer model..')
    model, style_losses, content_losses = get_style_model_and_losses(cnn,
        style_img, content_img, style_weight, content_weight)
    input_param, optimizer = get_input_param_optimizer(input_img)

    print('Optimizing..')
    run = [0]
    while run[0] <= num_steps:

        def closure():
            # correct the values of updated input image
            input_param.data.clamp_(0, 1)


        # 首先梯度置0
            optimizer.zero_grad()
            model(input_param)
            style_score = 0
            content_score = 0

        # 這個有意思。直接呼叫假的“backward”
        # 這個backward會呼叫隱藏層約束的loss的真的backward！
        # 這個假的backward其實主要是返回相應的loss用的。
        # 寫法非常巧妙。
            for sl in style_losses:
                style_score += sl.backward()

        # content層與style層甚至可以分開來backward！
            for cl in content_losses:
                content_score += cl.backward()

            run[0] += 1
            if run[0] % 50 == 0:
                print("run {}:".format(run))
                print('Style Loss : {:4f} Content Loss: {:4f}'.format(
                    style_score.data[0], content_score.data[0]))
                print()

            return style_score + content_score

        optimizer.step(closure)

    # a last correction...
    input_param.data.clamp_(0, 1)

    return input_param.data

output = run_style_transfer(cnn, content_img, style_img, input_img)

plt.figure()
imshow(output, title='Output Image')

# sphinx_gallery_thumbnail_number = 4
plt.ioff()
plt.show()

很方便！下面程式碼展示了自定義層竟然可以分開進行backward！一點都不影響！

            for sl in style_losses:
                style_score += sl.backward()
            for cl in content_losses:
                content_score += cl.backward()

其實很簡單，backward會計算相應的網路結點的梯度，如果梯度不置0，那麼這些梯度是不斷累加的。不過pytorch為了節省空間，除了葉子結點（自己建立的Variable)之外的結點的梯度，一旦計算完，就會清空！所以你一般想看中間層的梯度是看不到的！除非新增hook！那麼我們就不能保留這些中間結點的梯度嗎？用retain_graph!

retain_graph與detach的使用

retain_graph就是用來儲存計算反向時的graph的，當retain_graph為true時，那麼就可以多次單獨backward，而不怕上一次的梯度消失！這也就是為什麼

    def backward(self, retain_graph=True):
        self.loss.backward(retain_graph=retain_graph)
        return self.loss

另外一點是，這裡有一個detach。這是很重要的。因為這裡的target是用style圖傳入網路中得到的。它是一個Variable！有著自己的計算圖。detach()的作用就是將這個結點“截斷”，使得其變成葉子節點，就好像是我們自己建立的一個結點，這樣的話target.grad_fn變成None，梯度傳到它就不會往前進行。

    def __init__(self, target, weight):
        super(StyleLoss, self).__init__()
        self.target = target.detach() * weight
        self.weight = weight
        self.gram = GramMatrix()
        self.criterion = nn.MSELoss()

總結

自動求導的框架不用寫反向傳播，前提是自定義module中必須全部用Variable進行操作，否則就無法建立正確的graph！
pytorch為了節省空間，不會保留反向的圖，也就意味著中間結點的grad都無法返回，除非你設定retain_graph為true。這也為了“多個loss單獨backward”成為可能。
注意detach的使用，將計算圖截斷，使其變成葉子結點。
這種寫backward的方式值得借鑑。

Pytorch學習（十）---解讀Neural Style程式碼

總說

pytorch官網的neural style程式碼

對比torch的Gram寫法

Neural Style的自定義module的backward是什麼？

retain_graph與detach的使用

總結

Pytorch學習（十）---解讀Neural Style程式碼

Pytorch學習（三）--用50行程式碼搭建ResNet

機器學習與深度學習系列連載：第二部分深度學習（十）卷積神經網路 1 Convolutional Neural Networks

pytorch學習（十一）視覺化工具——visdom

Pytorch學習（十六）----獲取網路的任意一層的輸出

Pytorch學習（十七）--- 模型load各種問題解決

算法學習（十）

Python學習（十） —— 模塊和包

鳥哥的linux私房菜學習-（十）vim程序編輯器

C++語言學習（十）——繼承與派生

深度學習（十）訓練時的調參技巧

機器學習之numpy和matplotlib學習（十）

python基礎學習（十）字串

ｂｕｉｌｄｒｏｏｔ學習（十）——ａｔ９１ｓａｍ９ｇ４５軟體平臺更新

ionic學習（十）：ionic3專案打包成手機瀏覽器、安卓apk專案

深度學習（三）Convolutional Neural Network

機器學習（十）優化演算法利器之梯度下降（Gradient Descend）

Android學習（十）

Android學習（十）—— Android自定義控制元件

pytorch學習（一）：torch.nn.utils.rnn.pack_padded_sequence()的用法

Pytorch學習（十）---解讀Neural Style程式碼

總說

pytorch官網的neural style程式碼

對比torch的Gram寫法

Neural Style的自定義module的backward是什麼？

retain_graph與detach的使用

總結

相關推薦