0- 背景

所謂的風格轉換是基於一張Content影象和一張Style影象，將兩者融合，生成一張新的影象，分別兼具兩者的內容和風格。
所需要的依賴如下：

import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *
import numpy as np
import tensorflow as tf

%matplotlib inline

1- Transfer Learning

遷移學習是將其他任務的學習結果應用於一個新的任務。Neural Style Transfer (NST) 就是基於已經訓練過用於其他任務的convolutional network模型。
我們採用的是VGG network，該模型是基於大量的ImageNet database訓練出的，學習到很多高階和低階層次的特徵。
模型載入：

model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
print(model)
#注：該模型可以從http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat下載到，有些大，500MB左右

輸出資訊：

{'conv5_1': <tf.Tensor 'Relu_12:0' shape=(1, 19, 25, 512) dtype=float32>, 'conv4_1': <tf.Tensor 'Relu_8:0' shape=(1, 38, 50, 512) dtype=float32>, 'avgpool1': <tf.Tensor 'AvgPool:0' shape=(1, 150, 200, 64) dtype=float32>, 'conv4_3': <tf.Tensor 'Relu_10:0' shape=(1, 38, 50, 512) dtype 
=float32>, 'conv2_1': <tf.Tensor 'Relu_2:0' shape=(1, 150, 200, 128) dtype=float32>, 'conv5_3': <tf.Tensor 'Relu_14:0' shape=(1, 19, 25, 512) dtype=float32>, 'input': <tf.Variable 'Variable:0' shape=(1, 300, 400, 3) dtype=float32_ref>, 'avgpool2': <tf.Tensor 'AvgPool_1:0' shape=(1, 75, 100, 128) dtype=float32>, 'conv3_4': <tf.Tensor 'Relu_7:0' shape=(1, 75, 100, 256) dtype=float32>, 'conv5_2': <tf.Tensor 'Relu_13:0' shape=(1, 19, 25, 512) dtype=float32>, 'conv3_1': <tf.Tensor 'Relu_4:0' shape=(1, 75, 100, 256) dtype=float32>, 'conv3_2': <tf.Tensor 'Relu_5:0' shape=(1, 75, 100, 256) dtype=float32>, 'avgpool3': <tf.Tensor 'AvgPool_2:0' shape=(1, 38, 50, 256) dtype=float32>, 'conv3_3': <tf.Tensor 'Relu_6:0' shape=(1, 75, 100, 256) dtype=float32>, 'conv5_4': <tf.Tensor 'Relu_15:0' shape=(1, 19, 25, 512) dtype=float32>, 'conv1_1': <tf.Tensor 'Relu:0' shape=(1, 300, 400, 64) dtype=float32>, 'conv4_2': <tf.Tensor 'Relu_9:0' shape=(1, 38, 50, 512) dtype=float32>, 'avgpool5': <tf.Tensor 'AvgPool_4:0' shape=(1, 10, 13, 512) dtype=float32>, 'conv4_4': <tf.Tensor 'Relu_11:0' shape=(1, 38, 50, 512) dtype=float32>, 'conv2_2': <tf.Tensor 'Relu_3:0' shape=(1, 150, 200, 128) dtype=float32>, 'conv1_2': <tf.Tensor 'Relu_1:0' shape=(1, 300, 400, 64) dtype=float32>, 'avgpool4': <tf.Tensor 'AvgPool_3:0' shape=(1, 19, 25, 512) dtype=float32>}

該model以字典方式儲存，其中的key是變數名，對應的值則是其作為一個tensor所對應的變數值。我們可以通過以下方式將影象輸入到模型中：

model["input"].assign(image)

當我們想要檢視特定網路層的啟用值，可以如下操作：

sess.run(model["conv4_2"])

conv4_2是對應的Tensor。

2- Neural Style Transfer

構建風格轉換演算法的流程如下:

建立content cost function $J_{c o n t e n t} (C, G)$
建立the style cost function $J_{s t y l e} (S, G)$
聯合建立整體代價函式 $J (G) = α J_{c o n t e n t} (C, G) + β J_{s t y l e} (S, G)$ .

2-1 - Computing the content cost

對於content image C，可以採用以下方式show檢視：

content_image = scipy.misc.imread("images/louvre.jpg")
imshow(content_image)

對於層數的選擇，我們一般不取太大也不取太小。層數太多，提取了更高階特徵，在內容上的相似度，在視覺效果上就不好，層數太少，提取的特徵又太低階，也不行。這點，可以設定不同的網路層數，然後觀察對比具體結果。

假設我們選取第 $l$ 層的網路進行分析，image C輸入到預訓練的VGG network，並進行前向傳播。 $a^{(C)}$ 是該層的啟用值，其tensor的尺寸= $n_{H} \times n_{W} \times n_{C}$ 。對於image G做相同的處理：影象 G輸入到網路，前向傳播。同樣記 $a^{(G)}$ 為對應的啟用值。定義content cost function如下：

\begin{matrix} (1) & J_{c o n t e n t} (C, G) = \frac{1}{4 \times n_{H} \times n_{W} \times n_{C}} \sum_{all entries} (a^{(C)} - a^{(G)})^{2} \end{matrix}

這裡的 $a^{(C)}$ and $a^{(G)}$ 都是體資料（volumes ），即三維堆疊起來的。在計算 cost $J_{c o n t e n t} (C, G)$ 時候，可以展開為2D。其實在計算 $J_{c o n t e n t}$ , 可以不用，而在計算style 代價函式 $J_{s t y l e}$ 時需要。展開方法如下：

這裡寫圖片描述

content的代價函式實現如下：

# GRADED FUNCTION: compute_content_cost

def compute_content_cost(a_C, a_G):
    """
    Computes the content cost

    Arguments:
    a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

    Returns: 
    J_content -- scalar that you compute using equation 1 above.
    """

    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    # Reshape a_C and a_G (≈2 lines)
    a_C_unrolled = tf.transpose(tf.reshape(a_C, [n_H * n_W, n_C]))
    a_G_unrolled = tf.transpose(tf.reshape(a_G, [n_H * n_W, n_C]))

    # compute the cost with tensorflow (≈1 line)
    J_content = tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled,a_G_unrolled)))/(4*n_H*n_W*n_C)
    ### END CODE HERE ###

    return J_content

測試：

tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    a_C = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    J_content = compute_content_cost(a_C, a_G)
    print("J_content = " + str(J_content.eval()))

測試結果：

J_content   6.76559

2-2 Computing the style cost

先看下style影象：

style_image = scipy.misc.imread("images/monet_800600.jpg")
imshow(style_image)

2-2-1 Style matrix

style matrix也稱為”Gram matrix.”（格拉姆矩陣）。線上性代數中，vectors $(v_{1}, \dots, v_{n})$ 的 Gram matrix G 中各個位置的元素是vector中dot product結果，即 $G_{i j} = v_{i}^{T} v_{j} = n p . d o t (v_{}$

第4門課程-卷積神經網路-第四周作業(影象風格轉換)

0- 背景

1- Transfer Learning

2- Neural Style Transfer

2-1 - Computing the content cost

2-2 Computing the style cost

2-2-1 Style matrix

Keras卷積神經網路識別CIFAR-10影象（2）

使用全卷積神經網路FCN，進行影象語義分割詳解(附程式碼實現)

Deep Learning.ai學習筆記_第四門課_卷積神經網路

卷積神經網路課程筆記-實際應用（第三、四周）

吳恩達Coursera深度學習課程 deeplearning.ai (4-1) 卷積神經網路--程式設計作業

吳恩達Coursera深度學習課程 deeplearning.ai (4-1) 卷積神經網路--課程筆記

DeepLearning.ai作業:(4-1)-- 卷積神經網路（Foundations of CNN）

DeepLearning.ai筆記:(4-1)-- 卷積神經網路（Foundations of CNN）

卷積神經網路（4）----目標檢測