[深度學習]半監督學習、無監督學習之Variational Auto-Encoder變分自編碼器(附程式碼)

阿新 • • 發佈：2018-12-24

論文全稱：《Auto-Encoding Variational Bayes》

論文地址：https://arxiv.org/pdf/1312.6114.pdf

論文程式碼：

keras 版本：https://github.com/bojone/vae

pytorch 版本：https://colab.research.google.com/github/smartgeometry-ucl/dl4g/blob/master/variational_autoencoder.ipynb

關於VAE的部落格教程網路上有很多，但是沒有幾個是能夠講得清晰明瞭的，而且能夠與程式碼結合更是少之又少。

“Talk is cheap，show me the code”

這裡推薦一個博主寫的挺不錯的VAE分析：https://spaces.ac.cn/archives/5253

基本上上面這篇可以與下面的程式碼一致，而我覺得自己沒有必要再去解釋很多公式的VAE。

首先匯入包和設定超引數：

import os

import torch
import torch.nn as nn
import torch.nn.functional as F

# 2-d latent space, parameter count in same order of magnitude
# as in the original VAE paper (VAE paper has about 3x as many)
latent_dims = 2
num_epochs = 100
batch_size = 128
capacity = 64
learning_rate = 1e-3
variational_beta = 1
use_gpu = True

載入MINIST資料集：

import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST

img_transform = transforms.Compose([
    transforms.ToTensor()
])

train_dataset = MNIST(root='./data/MNIST', download=True, train=True, transform=img_transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = MNIST(root='./data/MNIST', download=True, train=False, transform=img_transform)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

定義VAE的結構，整體結構和Autoencoder很類似，但encoder學習的是隱變數的均值和方差，然後再根據他們生成隱變數。注意在encoder中分別有兩個全連線層，對應於均值和方差。

而latent_sample函式對應於reparameterization tricks。從N(0,I)中取樣一ε，然後讓Z=μ+ε×σ。

class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        c = capacity
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=c, kernel_size=4, stride=2, padding=1) # out: c x 14 x 14
        self.conv2 = nn.Conv2d(in_channels=c, out_channels=c*2, kernel_size=4, stride=2, padding=1) # out: c x 7 x 7
        self.fc_mu = nn.Linear(in_features=c*2*7*7, out_features=latent_dims)
        self.fc_logvar = nn.Linear(in_features=c*2*7*7, out_features=latent_dims)
            
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = x.view(x.size(0), -1) # flatten batch of multi-channel feature maps to a batch of feature vectors
        x_mu = self.fc_mu(x)
        x_logvar = self.fc_logvar(x)
        return x_mu, x_logvar

class Decoder(nn.Module):
    def __init__(self):
        super(Decoder, self).__init__()
        c = capacity
        self.fc = nn.Linear(in_features=latent_dims, out_features=c*2*7*7)
        self.conv2 = nn.ConvTranspose2d(in_channels=c*2, out_channels=c, kernel_size=4, stride=2, padding=1)
        self.conv1 = nn.ConvTranspose2d(in_channels=c, out_channels=1, kernel_size=4, stride=2, padding=1)
            
    def forward(self, x):
        x = self.fc(x)
        x = x.view(x.size(0), capacity*2, 7, 7) # unflatten batch of feature vectors to a batch of multi-channel feature maps
        x = F.relu(self.conv2(x))
        x = torch.sigmoid(self.conv1(x)) # last layer before output is sigmoid, since we are using BCE as reconstruction loss
        return x
    
class VariationalAutoencoder(nn.Module):
    def __init__(self):
        super(VariationalAutoencoder, self).__init__()
        self.encoder = Encoder()
        self.decoder = Decoder()
    
    def forward(self, x):
        latent_mu, latent_logvar = self.encoder(x)
        latent = self.latent_sample(latent_mu, latent_logvar)
        x_recon = self.decoder(latent)
        return x_recon, latent_mu, latent_logvar
    
    def latent_sample(self, mu, logvar):
        if self.training:
            # the reparameterization trick
            std = logvar.mul(0.5).exp_()
            eps = torch.empty_like(std).normal_()
            return eps.mul(std).add_(mu)
        else:
            return mu

定義loss函式，第一部分是關於輸入與輸出的相似程度，第二部分則是用KL散度來衡量學習到的隱變數空間和真實隱變數空間之間的相似性。

def vae_loss(recon_x, x, mu, logvar):
    # recon_x is the probability of a multivariate Bernoulli distribution p.
    # -log(p(x)) is then the pixel-wise binary cross-entropy.
    # Averaging or not averaging the binary cross-entropy over all pixels here
    # is a subtle detail with big effect on training, since it changes the weight
    # we need to pick for the other loss term by several orders of magnitude.
    # Not averaging is the direct implementation of the negative log likelihood,
    # but averaging makes the weight of the other loss term independent of the image resolution.
    recon_loss = F.binary_cross_entropy(recon_x.view(-1, 784), x.view(-1, 784), reduction='sum')
    
    # KL-divergence between the prior distribution over latent vectors
    # (the one we are going to sample from when generating new images)
    # and the distribution estimated by the generator for the given image.
    kldivergence = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    
    return recon_loss + variational_beta * kldivergence

用上面的結構初始化一個vae，看看有多少權重引數需要學習。

vae = VariationalAutoencoder()

device = torch.device("cuda:0" if use_gpu and torch.cuda.is_available() else "cpu")
vae = vae.to(device)

num_params = sum(p.numel() for p in vae.parameters() if p.requires_grad)
print('Number of parameters: %d' % num_params)

訓練vae

optimizer = torch.optim.Adam(params=vae.parameters(), lr=learning_rate, weight_decay=1e-5)

# set to training mode
vae.train()

train_loss_avg = []

print('Training ...')
for epoch in range(num_epochs):
    train_loss_avg.append(0)
    num_batches = 0
    
    for image_batch, _ in train_dataloader:
        
        image_batch = image_batch.to(device)

        # vae reconstruction
        image_batch_recon, latent_mu, latent_logvar = vae(image_batch)
        
        # reconstruction error
        loss = vae_loss(image_batch_recon, image_batch, latent_mu, latent_logvar)
        
        # backpropagation
        optimizer.zero_grad()
        loss.backward()
        
        # one step of the optmizer (using the gradients from backpropagation)
        optimizer.step()
        
        train_loss_avg[-1] += loss.item()
        num_batches += 1
        
    train_loss_avg[-1] /= num_batches
    print('Epoch [%d / %d] average reconstruction error: %f' % (epoch+1, num_epochs, train_loss_avg[-1]))

描繪loss曲線：

import matplotlib.pyplot as plt
plt.ion()

fig = plt.figure()
plt.plot(train_loss_avg)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()

如果不想從零訓練，可以載入預訓練模型。

filename = 'vae_2d.pth'
# filename = 'vae_10d.pth'
import urllib
if not os.path.isdir('./pretrained'):
    os.makedirs('./pretrained')
print('downloading ...')
urllib.request.urlretrieve ("http://geometry.cs.ucl.ac.uk/creativeai/pretrained/"+filename, "./pretrained/"+filename)
vae.load_state_dict(torch.load('./pretrained/'+filename))
print('done')

# this is how the VAE parameters can be saved:
# torch.save(vae.state_dict(), './pretrained/my_vae.pth')

在測試集測試一下結果。

# set to evaluation mode
vae.eval()

test_loss_avg, num_batches = 0, 0
for image_batch, _ in test_dataloader:
    
    with torch.no_grad():
    
        image_batch = image_batch.to(device)

        # vae reconstruction
        image_batch_recon, latent_mu, latent_logvar = vae(image_batch)

        # reconstruction error
        loss = vae_loss(image_batch_recon, image_batch, latent_mu, latent_logvar)

        test_loss_avg += loss.item()
        num_batches += 1
    
test_loss_avg /= num_batches
print('average reconstruction error: %f' % (test_loss_avg))

視覺化結果

import numpy as np
import matplotlib.pyplot as plt
plt.ion()

import torchvision.utils

vae.eval()

# This function takes as an input the images to reconstruct
# and the name of the model with which the reconstructions
# are performed
def to_img(x):
    x = x.clamp(0, 1)
    return x

def show_image(img):
    img = to_img(img)
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

def visualise_output(images, model):

    with torch.no_grad():
    
        images = images.to(device)
        images, _, _ = model(images)
        images = images.cpu()
        images = to_img(images)
        np_imagegrid = torchvision.utils.make_grid(images[1:50], 10, 5).numpy()
        plt.imshow(np.transpose(np_imagegrid, (1, 2, 0)))
        plt.show()

images, labels = iter(test_dataloader).next()

# First visualise the original images
print('Original images')
show_image(torchvision.utils.make_grid(images[1:50],10,5))
plt.show()

# Reconstruct and visualise the images using the vae
print('VAE reconstruction:')
visualise_output(images, vae)

視覺化2d隱變數空間

# load a network that was trained with a 2d latent space
if latent_dims != 2:
    print('Please change the parameters to two latent dimensions.')
    
with torch.no_grad():
    
    # create a sample grid in 2d latent space
    latent_x = np.linspace(-1.5,1.5,20)
    latent_y = np.linspace(-1.5,1.5,20)
    latents = torch.FloatTensor(len(latent_y), len(latent_x), 2)
    for i, lx in enumerate(latent_x):
        for j, ly in enumerate(latent_y):
            latents[j, i, 0] = lx
            latents[j, i, 1] = ly
    latents = latents.view(-1, 2) # flatten grid into a batch

    # reconstruct images from the latent vectors
    latents = latents.to(device)
    image_recon = vae.decoder(latents)

    image_recon = image_recon.cpu()
    fig, ax = plt.subplots(figsize=(10, 10))
    show_image(torchvision.utils.make_grid(image_recon.data[:400],20,5))
    plt.show()

[深度學習]半監督學習、無監督學習之Variational Auto-Encoder變分自編碼器(附程式碼)

論文全稱：《Auto-Encoding Variational Bayes》論文地址：https://arxiv.org/pdf/1312.6114.pdf 論文程式碼： keras 版本：https://github.com/bojone/vae pytorch 版本：https

[深度學習]半監督學習、無監督學習之Autoencoders自編碼器(附程式碼)

目錄自編碼器介紹從零開始訓練自編碼器驗證模型訓練結果視覺化結果載入預訓練模型自編碼器介紹自編碼器的結構簡單，由Encoder和Decoder組成，Encoder產生的Latent variables是潛在變數，它是Decoder的輸入。

【TensorFlow-windows】學習筆記六——變分自編碼器

前言對理論沒興趣的直接看程式碼吧，理論一堆，而且還有點複雜，我自己的描述也不一定準確，但是程式碼就兩三句話搞定了。國際慣例，參考博文理論基礎知識似然函式(引自百度百科) 似然函式是關於統計模型中的引數的函式，

變分貝葉斯、變分自編碼與變分遷移

目錄變分法簡介變分推斷變分貝葉斯變分自編碼變分與遷移 :heart: 一些資料變分法簡介變分法是研究依賴於某些未知函式的積分型泛函極值的一門科學。也就是求泛函極值的方法稱為變分法。典型例子最速降線

[深度學習]半監督學習、無監督學習之DCGAN深度卷積生成對抗網路(附程式碼)

論文全稱：《Generative Adversarial Nets》論文地址：https://arxiv.org/pdf/1406.2661.pdf 論文全稱：《UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GEN

一文讀懂監督學習、無監督學習、半監督學習、強化學習這四種深度學習方式

一般說來，訓練深度學習網路的方式主要有四種：監督、無監督、半監督和強化學習。在接下來的文章中，計算機視覺戰隊將逐個解釋這些方法背後所蘊含的理論知識。除此之外，計算機視覺戰隊將分享文獻中經常碰到的術語，並提供與數學相關的更多資源。監督學習（Supervised

有監督學習、無監督學習、半監督學習

class tail detail 局部特征連續 tails cannot 得到 ica 1.有監督學習：教計算機如何做事情。　　對於機器學習來說，有監督學習就是訓練數據既有特征又有標簽，通過訓練，讓機器可以自己找到特征和標簽之間的聯系，在面對只有特征沒有標簽的數據時，

概念：監督學習、無監督學習與半監督學習

（此為機器學習隨筆之一）機器學習中的演算法，主要有兩種：監督學習；半監督學習。 1 、名詞監督學習： supervised learning 無監督學習： unsupervised learning 半監督學習： semi-supervise

監督學習、無監督學習、半監督學習和強化學習

Author: LiChong0309 Lable: Reinforcement learning、Artificial intelligence、Deep learning、Machine learning 1.Machine lea

機器學習（二）：有監督學習、無監督學習和半監督學習

一、基本概念 1 特徵（feature）資料的特徵。舉例：書的內容 2 標籤（label）資料的標籤。舉例：書屬於的類別，例如“計算機”“圖形學”“英文書”“教材”等。 3 學習（learning）將很多資料丟給計算機分析，以此

學習筆記TF057:TensorFlow MNIST，卷積神經網絡、循環神經網絡、無監督學習

back ide str 描述 com 類別 bat softmax 比例 MNIST 卷積神經網絡。https://github.com/nlintz/TensorFlow-Tutorials/blob/master/05_convolutional_net.py 。Te

十大統計技術，包括線性迴歸、分類、重取樣、降維、無監督學習等。

統計學習方法的經典研究主題包括：線性迴歸模型感知機 k 近鄰法樸素貝葉斯法決策樹 Logistic 迴歸與最大熵模型支援向量機提升方法 EM 演算法

機器學習（十六）無監督學習、聚類和KMeans聚類

無監督學習、聚類聚類是在樣本沒有標註的情況下，對樣本進行特徵提取並分類，屬於無監督學習的內容。有監督學習和無監督學習的區別就是需要分析處理的資料樣本是否事先已經標註。如下圖，左邊是有監督，右邊是無監督：應用場景也有所不同。無

學習筆記TF057:TensorFlow MNIST，卷積神經網路、迴圈神經網路、無監督學習

構建模型。定義輸入資料，預處理資料。讀取資料MNIST，得到訓練集圖片、標記矩陣，測試集圖片標記矩陣。trX、trY、teX、teY 資料矩陣表現。trX、teX形狀變為[-1,28,28,1]，-1 不考慮輸入圖片數量，28x28 圖片長、寬畫素數，1

機器學習基礎（五十七）—— 監督學習、無監督學習

僅使用 inputs x(t) 用於學習： automatically extract meaningful features for your data leverage the availabi

機器學習、監督學習、無監督學習、分類、迴歸、聚類的概念

機器學習的兩種主要定義：定義1 ： Arthur Samuel (1959):Field of study that gives computers the ability to learn without being explicitly programmed. 一

Hinton Neural Networks課程筆記1e: 監督學習、強化學習、無監督學習，及其應用

這節課介紹了機器學習的幾大框架，分別是監督學習（supervised learning）、強化學習（reinforcement learning）和無監督學習（unsupervised learning）。都是十分古老、傳統、廣泛應用的框架。監督學習

有監督學習、無監督學習、引數估計、非引數估計

有監督學習和無監督學習兩者應用在模式識別的領域，目的是對給定的樣本進行劃分。有監督學習將樣本分為訓練集和測試集，訓練集中的資料帶有標籤，標誌著這些樣本來自哪些類別，訓練集中的資料沒有標籤。有監督學習的目的就是學習訓練集中不同類別資料的特

監督學習、無監督學習與強化學習

它與監督學習的不同之處，在於我們事先沒有任何訓練樣本，而需要直接對資料進行建模。這聽起來似乎有點不可思議，但是在我們自身認識世界的過程中很多處都用到了無監督學習。比如我們去參觀一個畫展，我們完全對藝術一無所知，但是欣賞完多幅作品之後，我們也能把它們分成不同的派別（比如哪些更朦朧一點，哪些更寫實一些，即使我們不

一種用迴歸神經網路學習說話人嵌入的無監督神經網路預測框架

An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings using Recurrent Neural Networks 一種用迴歸神經網路學習說話人嵌入的無監督神經網路預測框架摘要本文提出

[深度學習]半監督學習、無監督學習之Variational Auto-Encoder變分自編碼器(附程式碼)

相關推薦