1. 程式人生 > 其它 >計算機視覺任務:影象梯度和影象完成

計算機視覺任務:影象梯度和影象完成

該筆記是以斯坦福cs231n課程的python程式設計任務為主線,展開對該課程主要內容的理解和部分數學推導。這篇文章是關於計算機視覺處理的,分為兩篇文章撰寫完成。此為第二篇:根據上篇文章提到,利用深度學習裡的RNN和LSTM等神經網路處理後的資料來計算影象的梯度,並且利用不同的影象梯度來生成不同型別的影象。

第一篇:《計算機視覺處理三大任務:分類,定位和檢測》

04

影象梯度(Image Gradients)

這部分我們將用預訓練好的CNN模型來計算影象的梯度,並用影象梯度來產生class saliency maps 和 fooling images。這部分我們會用到TinyImageNet資料集,它是ILSVRC-2012分類資料集的一個子集,包含了200個類,每一個類擁有500張訓練圖片,50張驗證圖片和50張測試圖片,每張圖片大小為64x64。TinyImageNet資料集被分成了兩部分:TinyImageNet-100-A和TinyImageNet-100-B,每部分包含100個類,這裡我們使用的是TinyImageNet-100-A。

Examples of the TinyImageNet-100-A

1.Saliency Maps

給定一張圖片X,我們想要知道到底是圖片中的哪些部分決定了該圖片的最終分類結果。給定一個類,我們可以通過反向傳播求出X關於loss function的偏導矩陣,這個偏導矩陣就是該圖片的影象梯度,然後計算出類顯著度圖(class saliency map, csm)。Karen Simonyan論文的3.1節(https://arxiv.org/pdf/1312.6034.pdf)給出了計算方法:如果圖片是灰度圖,那麼csm就取影象梯度的絕對值;如果是RGB圖,csm就取影象梯度3個通道中絕對值最大的那個通道。csm中元素值的大小表示對應位置的圖片畫素對最終分類結果的影響程度。

程式碼如下:

def compute_saliency_maps(X, y, model):

    N,C,H,W = X.shape
    saliency = np.zeros((N,H,W))

    # Compute the score by a single forward pass
    scores, cache = model.forward(X, mode='test')    # Score size (N,100)

    # The loss function we want to optimize(maximize)
    # loss = (scores[np.arange(N), y] - lambda*np.sqrt(np.sum(X**2)))   # Size (N,)

    # The gradient of this loss wih respect to the input image
    dscores = np.zeros_like(scores)
    dscores[np.arange(N), y] = 1.0
    dX, grads = model.backward(dscores, cache)
    saliency += np.max(np.abs(dX), axis=1)

    return saliency

下圖是一些saliency maps的視覺化結果:

Random images

Cherry-picked images

2.Fooling Images

我們可以用影象梯度來生成 虛假影象(fooling images)。給定一張圖片和一個目標類,我們可以對該圖片執行梯度上升(將影象梯度不斷地疊加到原圖片上),以產生一個fooling image。該fooling image和原圖片在視覺上非常接近,但是CNN會把它識別成我們預先設定的目標類。

程式碼如下:

def make_fooling_image(X, target_y, model):
    X_fooling = X.copy()
    N,C,H,W = X_fooling.shape      # N=1
    i = 0
    y_pred = -1
    lr = 200.0
    while (y_pred != target_y) & (i<200):
        scores, cache = model.forward(X_fooling, mode='test') # Score size (N,100)
        # The loss function we want to optimize(maximize)
        # loss = scores[np.arange(N), target_y]                 # Size (N,)
        # print loss
        # The gradient of this loss wih respect to the input image
        dscores = np.zeros_like(scores)
        dscores[np.arange(N), target_y] = 1.0
        dX, grads = model.backward(dscores, cache)
        X_fooling += lr*dX
        y_pred = model.loss(X_fooling).argmax(axis=1)
        i+=1
        print 'Iteration %d: current class: %d; target class: %d ' % (i, y_pred, target_y)

    return X_fooling

Left: original image, Middle: fooling image, Right: difference

從上圖結果我們可以看出:CNN依舊無法擺脫維度的詛咒這一難題,因為存在對抗樣本使它無法正確辨識。

05

影象生成(Image Generation)

這一部分我們繼續探索影象梯度,我們將使用不同的方法通過影象梯度生成影象。

1.Class visualization

給定一個目標類(比如蜘蛛),我們可以在一個隨機噪聲影象上,利用梯度上升來生成一個(蜘蛛)影象,並且CNN會把它識別成目標類。具體實現方法可以參見論文:Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps(https://arxiv.org/pdf/1312.6034.pdf)。

程式碼如下:

def create_class_visualization(target_y, model, **kwargs):

    learning_rate = kwargs.pop('learning_rate', 10000)
    blur_every = kwargs.pop('blur_every', 1)
    l2_reg = kwargs.pop('l2_reg', 1e-6)
    max_jitter = kwargs.pop('max_jitter', 4)
    num_iterations = kwargs.pop('num_iterations', 200)
    show_every = kwargs.pop('show_every', 25)

    X = np.random.randn(1, 3, 64, 64)
    for t in xrange(num_iterations):
        # As a regularizer, add random jitter to the image
        ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)
        X = np.roll(np.roll(X, ox, -1), oy, -2)

        # Compute the score and gradient
        scores, cache = model.forward(X, mode='test')
        # loss = scores[0, target_y] - l2_reg*np.sum(X**2)
        dscores = np.zeros_like(scores)
        dscores[0, target_y] = 1.0
        dX, grads = model.backward(dscores, cache)
        dX -= 2*l2_reg*X

        X += learning_rate*dX

        # Undo the jitter
        X = np.roll(np.roll(X, -ox, -1), -oy, -2)

        # As a regularizer, clip the image
        X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])

        # As a regularizer, periodically blur the image
        if t % blur_every == 0:
            X = blur_image(X)

        # Periodically show the image
        if t % show_every == 0:
            print 'The loss is %f' % loss
            plt.imshow(deprocess_image(X, data['mean_image']))
            plt.gcf().set_size_inches(3, 3)
            plt.axis('off')
            plt.title('Iteration: %d' % t)
            plt.show()

    return X

下圖是迭代過程中生成的(蜘蛛)影象:

Generated images

2.Feature Inversion

這部分我們將完成一個很有意思的工作:在一張隨機噪聲影象上重建出指定層CNN學習到的影象特徵表達。詳細的實現方法參見論文: Understanding Deep Image Representations by Inverting them (https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/mahendran15understanding.pdf)和 Understanding Neural Networks Through Deep Visualization(http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf)。

程式碼如下:

def invert_features(target_feats, layer, model, **kwargs): learning_rate = kwargs.pop('learning_rate', 10000) num_iterations = kwargs.pop('num_iterations', 500) l2_reg = kwargs.pop('l2_reg', 1e-7) blur_every = kwargs.pop('blur_every', 1) show_every = kwargs.pop('show_every', 50) X = np.random.randn(1, 3, 64, 64) for t in xrange(num_iterations): # Forward until target layer feats, cache = model.forward(X, end=layer, mode='test') # Compute the loss loss = np.sum((feats-target_feats)**2) + l2_reg*np.sum(X**2) # Compute the gradient of the loss with respect to the activation dfeats = 2*(feats-target_feats) dX, grads = model.backward(dfeats, cache) dX += 2*l2_reg*X X -= learning_rate*dX # As a regularizer, clip the image X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image']) # As a regularizer, periodically blur the image if (blur_every > 0) and t % blur_every == 0: X = blur_image(X) if (show_every > 0) and (t % show_every == 0 or t + 1 == num_iterations): print loss plt.imshow(deprocess_image(X, data['mean_image'])) plt.gcf().set_size_inches(3, 3) plt.axis('off') plt.title('Iteration: %d' % t) plt.show()

下圖是迭代過程中生成的影象特徵(淺層和深層):

Shallow feature reconstruction

Deep feature reconstruction

3.DeepDream

這部分我們體驗一下簡化版的DeepDream。實現思想很簡單,我們先選定CNN的某一層(我們將在該層dream),然後將需要dream的影象輸入進預訓練好的CNN。前向傳播至目標層,令該層的梯度等於該層的啟用值(特徵)。然後反向傳播至輸入層,求出影象梯度,同過梯度下降法將影象梯度不斷疊加到輸入影象上。

程式碼如下:

def deepdream(X, layer, model, **kwargs):
    X = X.copy()
    learning_rate = kwargs.pop('learning_rate', 5.0)
    max_jitter = kwargs.pop('max_jitter', 16)
    num_iterations = kwargs.pop('num_iterations', 200)
    show_every = kwargs.pop('show_every', 50)

    for t in tqdm(xrange(num_iterations)):
        # As a regularizer, add random jitter to the image
        ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)
        X = np.roll(np.roll(X, ox, -1), oy, -2)

        # Forward until dreaming layer
        fea, cache = model.forward(X, end=layer ,mode='test')

        # Set the gradient equal to the feature
        dfea = fea
        dX, grads = model.backward(dfea, cache)
        X += learning_rate*dX

        # Undo the jitter
        X = np.roll(np.roll(X, -ox, -1), -oy, -2)

        # As a regularizer, clip the image
        mean_pixel = data['mean_image'].mean(axis=(1, 2), keepdims=True)
        X = np.clip(X, -mean_pixel, 255.0 - mean_pixel)

        # Periodically show the image
        if t == 0 or (t + 1) % show_every == 0:
            img = deprocess_image(X, data['mean_image'], mean='pixel')
            plt.imshow(img)
            plt.title('Iteration: %d' % (t + 1))
            plt.gcf().set_size_inches(8, 8)
            plt.axis('off')
            plt.show()

    return X

接下來我們就可以生成DeepDream影象啦!

Tibidabo

Deeplayer

Leaning Tower