關於FV(Fisher Vector)和變分自編碼VAE（Variational Autoencoder）的原理簡介

阿新 • • 發佈：2019-01-20

1. FV（Fisher Vector）

FV的思想用一句話概括就是：用所有的聚類中心的線性組合去表示每個特徵點

簡單來說，假設樣本各特徵符合獨立同分布（i.i.d）則樣本的概率分佈可以由各個特徵維度的概率分佈的乘積得到。對該式取對數的話，就可以將乘法運算轉換為加法運算，即

Fisher vector就是對上式求偏導的結果歸一化後構成的。具體數學推導過程請參考點選開啟連結。fv的虛擬碼實現過程如下：

接下來是本文的重點部分，變分自編碼器。

2. VAE（Variational Autoencoder）

VAE的基本思想是：為每個樣本構造專屬的正態分佈，然後利用取樣重構

由於我們若能瞭解一組資料的概率分佈，就可以通過分佈對其進行取樣直接得到所有的可能資料，VAE就是採用了一種迂迴的方式去實現這一思想。對於每個樣本X，利用神經網路提取出該樣本的正態分佈引數（均值，方差）從而可以構建出每個樣本的專屬的正態分佈P（X）再取樣重構。然而直接對X取樣是比較困難的，因此假設存在變數Z服從標準正態分佈，就可以利用下式

先從標準正太分佈中取樣Z，然後根據Z來計算 X，即對於對應的取樣變數Z，利用生成器生成樣本X hat，即為X的重構值。

為了防止噪聲對重構過程產生影響，同時還應保證該模型具有生成能力，需要使所有的P（Z|X)都接近標準正態分佈，這是因為

此時滿足Z是標準正態分佈的先驗條件，可以從標準正態分佈中取樣Z來重構X。為了實現這一點，VAE採用的方式是在重構誤差項後面額外添加了誤差項。因為當Z逼近標準正態分佈時其均值應趨近於0，取對數後的方差也趨近於0（方差趨近於1），則二者對應的損失項應為各自值的範數的平方，利用KL散度合併二者得到該額外損失項的表示式為

至此，就是VAE的大體思路了。

接下來就是VAE另一非常重要的思想（技巧），即重引數技巧 Reparameterization Trick

由於取樣操作的不可導，為了反向優化提取提取均值和方差的模型，根據下式

就可以通過引數變換的方式取樣

取樣操作將不需要直接參與梯度下降過程，而是改為取樣結果參與，使模型可訓練。

最終的變分自編碼實現過程的虛擬碼為

最後補充說明幾點：

1.當噪聲為0時，變分自編碼器將退化為普通自編碼器，因此KL散度不應完全為0

2.該方法中使用的正態分佈能夠有效的保證KL散度在重構誤差表示式中的正常應用，使用其他概率分佈可能易出現kl散度接近無窮大的情況

3.variational（變分）體現在引入KL散度這一泛函的處理上

4.將label加入來輔助生成樣本，可以實現通過控制均值來生成某類影象，該方法稱為Conditional VAE，即CVAE

網上看到的VAE程式碼（來源github），較為簡單易懂，如下所示：

import itertools
import matplotlib as mpl
import numpy as np
import os
import tensorflow as tf
import tensorflow.contrib.slim as slim
import time
import seaborn as sns

from matplotlib import pyplot as plt
from scipy.misc import imsave
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets

# sns.set_style('whitegrid')

distributions = tf.distributions

flags = tf.app.flags
flags.DEFINE_string('data_dir', '/tmp/dat/', 'Directory for data')
flags.DEFINE_string('logdir', '/tmp/log/', 'Directory for logs')


flags.DEFINE_integer('latent_dim', 100, 'Latent dimensionality of model')
flags.DEFINE_integer('batch_size', 64, 'Minibatch size')
flags.DEFINE_integer('n_samples', 1, 'Number of samples to save')
flags.DEFINE_integer('print_every', 1000, 'Print every n iterations')
flags.DEFINE_integer('hidden_size', 200, 'Hidden size for neural networks')
flags.DEFINE_integer('n_iterations', 100000, 'number of iterations')

FLAGS = flags.FLAGS

# 提取mu和sigma的神經網路
def inference_network(x, latent_dim, hidden_size):
  """Construct an inference network parametrizing a Gaussian.
  Args:
    x: A batch of MNIST digits.
    latent_dim: The latent dimensionality.
    hidden_size: The size of the neural net hidden layers.
  Returns:
    mu: Mean parameters for the variational family Normal
    sigma: Standard deviation parameters for the variational family Normal
  """
  with slim.arg_scope([slim.fully_connected], activation_fn=tf.nn.relu):
    net = slim.flatten(x)
    net = slim.fully_connected(net, hidden_size)
    net = slim.fully_connected(net, hidden_size)
    gaussian_params = slim.fully_connected(
        net, latent_dim * 2, activation_fn=None)
  # The mean parameter is unconstrained
  mu = gaussian_params[:, :latent_dim]
  # The standard deviation must be positive. Parametrize with a softplus
  sigma = tf.nn.softplus(gaussian_params[:, latent_dim:])
  return mu, sigma

#利用取樣的z重構x的生成器
#因為資料集是二值影象，因此採用伯努利分佈為x|z的分佈形式，從而可以利用神經網路（全連線層）得到生成模型q（x|z）
def generative_network(z, hidden_size):
  """Build a generative network parametrizing the likelihood of the data
  Args:
    z: Samples of latent variables
    hidden_size: Size of the hidden state of the neural net
  Returns:
    bernoulli_logits: logits for the Bernoulli likelihood of the data
  """
  with slim.arg_scope([slim.fully_connected], activation_fn=tf.nn.relu):
    net = slim.fully_connected(z, hidden_size)
    net = slim.fully_connected(net, hidden_size)
    bernoulli_logits = slim.fully_connected(net, 784, activation_fn=None)
    bernoulli_logits = tf.reshape(bernoulli_logits, [-1, 28, 28, 1])
  return bernoulli_logits


def train():
  # Train a Variational Autoencoder on MNIST

  # Input placeholders
  with tf.name_scope('data'):
    x = tf.placeholder(tf.float32, [None, 28, 28, 1])
    tf.summary.image('data', x)

  with tf.variable_scope('variational'):
    q_mu, q_sigma = inference_network(x=x,
                                      latent_dim=FLAGS.latent_dim,
                                      hidden_size=FLAGS.hidden_size)
    # The variational distribution is a Normal with mean and standard
    # deviation given by the inference network
    q_z = distributions.Normal(loc=q_mu, scale=q_sigma)
    assert q_z.reparameterization_type == distributions.FULLY_REPARAMETERIZED

  with tf.variable_scope('model'):
    # The likelihood is Bernoulli-distributed with logits given by the
    # generative network
    p_x_given_z_logits = generative_network(z=q_z.sample(),
                                            hidden_size=FLAGS.hidden_size)
    p_x_given_z = distributions.Bernoulli(logits=p_x_given_z_logits)
    posterior_predictive_samples = p_x_given_z.sample()
    tf.summary.image('posterior_predictive',
                     tf.cast(posterior_predictive_samples, tf.float32))

  # Take samples from the prior
  # 取樣
  with tf.variable_scope('model', reuse=True):
    p_z = distributions.Normal(loc=np.zeros(FLAGS.latent_dim, dtype=np.float32),
                               scale=np.ones(FLAGS.latent_dim, dtype=np.float32))
    p_z_sample = p_z.sample(FLAGS.n_samples)
    p_x_given_z_logits = generative_network(z=p_z_sample,
                                            hidden_size=FLAGS.hidden_size)
    prior_predictive = distributions.Bernoulli(logits=p_x_given_z_logits)
    prior_predictive_samples = prior_predictive.sample()
    tf.summary.image('prior_predictive',
                     tf.cast(prior_predictive_samples, tf.float32))

  # Take samples from the prior with a placeholder
  with tf.variable_scope('model', reuse=True):
    z_input = tf.placeholder(tf.float32, [None, FLAGS.latent_dim])
    p_x_given_z_logits = generative_network(z=z_input,
                                            hidden_size=FLAGS.hidden_size)
    prior_predictive_inp = distributions.Bernoulli(logits=p_x_given_z_logits)
    prior_predictive_inp_sample = prior_predictive_inp.sample()

  # Build the evidence lower bound (ELBO) or the negative loss
  kl = tf.reduce_sum(distributions.kl_divergence(q_z, p_z), 1) #按行求和
  expected_log_likelihood = tf.reduce_sum(p_x_given_z.log_prob(x),
                                          [1, 2, 3])#？？？？

  elbo = tf.reduce_sum(expected_log_likelihood - kl, 0)

  optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001)

  train_op = optimizer.minimize(-elbo)

  # Merge all the summaries
  summary_op = tf.summary.merge_all()

  init_op = tf.global_variables_initializer()

  # Run training
  sess = tf.InteractiveSession()
  sess.run(init_op)

  mnist = read_data_sets(FLAGS.data_dir, one_hot=True)

  print('Saving TensorBoard summaries and images to: %s' % FLAGS.logdir)
  train_writer = tf.summary.FileWriter(FLAGS.logdir, sess.graph)

  # Get fixed MNIST digits for plotting posterior means during training
  np_x_fixed, np_y = mnist.test.next_batch(5000)
  np_x_fixed = np_x_fixed.reshape(5000, 28, 28, 1)
  np_x_fixed = (np_x_fixed > 0.5).astype(np.float32)

  t0 = time.time()
  for i in range(FLAGS.n_iterations):
    # Re-binarize the data at every batch; this improves results
    np_x, _ = mnist.train.next_batch(FLAGS.batch_size)
    np_x = np_x.reshape(FLAGS.batch_size, 28, 28, 1)
    np_x = (np_x > 0.5).astype(np.float32)
    sess.run(train_op, {x: np_x})

    # Print progress and save samples every so often
    if i % FLAGS.print_every == 0:
      np_elbo, summary_str = sess.run([elbo, summary_op], {x: np_x})
      train_writer.add_summary(summary_str, i)
      print('Iteration: {0:d} ELBO: {1:.3f} s/iter: {2:.3e}'.format(
          i,
          np_elbo / FLAGS.batch_size,
          (time.time() - t0) / FLAGS.print_every))
      t0 = time.time()

      # Save samples
      np_posterior_samples, np_prior_samples = sess.run(
          [posterior_predictive_samples, prior_predictive_samples], {x: np_x})
      for k in range(FLAGS.n_samples):
        f_name = os.path.join(
            FLAGS.logdir, 'iter_%d_posterior_predictive_%d_data.jpg' % (i, k))
        imsave(f_name, np_x[k, :, :, 0])
        f_name = os.path.join(
            FLAGS.logdir, 'iter_%d_posterior_predictive_%d_sample.jpg' % (i, k))
        imsave(f_name, np_posterior_samples[k, :, :, 0])
        f_name = os.path.join(
            FLAGS.logdir, 'iter_%d_prior_predictive_%d.jpg' % (i, k))
        imsave(f_name, np_prior_samples[k, :, :, 0])

      # Plot the posterior predictive space
      if FLAGS.latent_dim == 2:
        np_q_mu = sess.run(q_mu, {x: np_x_fixed})
        cmap = mpl.colors.ListedColormap(sns.color_palette("husl"))
        f, ax = plt.subplots(1, figsize=(6 * 1.1618, 6))
        im = ax.scatter(np_q_mu[:, 0], np_q_mu[:, 1], c=np.argmax(np_y, 1), cmap=cmap,
                        alpha=0.7)
        ax.set_xlabel('First dimension of sampled latent variable $z_1$')
        ax.set_ylabel('Second dimension of sampled latent variable mean $z_2$')
        ax.set_xlim([-10., 10.])
        ax.set_ylim([-10., 10.])
        f.colorbar(im, ax=ax, label='Digit class')
        plt.tight_layout()
        plt.savefig(os.path.join(FLAGS.logdir,
                                 'posterior_predictive_map_frame_%d.png' % i))
        plt.close()

        nx = ny = 20
        x_values = np.linspace(-3, 3, nx)
        y_values = np.linspace(-3, 3, ny)
        canvas = np.empty((28 * ny, 28 * nx))
        for ii, yi in enumerate(x_values):
          for j, xi in enumerate(y_values):
            np_z = np.array([[xi, yi]])
            x_mean = sess.run(prior_predictive_inp_sample, {z_input: np_z})
            canvas[(nx - ii - 1) * 28:(nx - ii) * 28, j *
                   28:(j + 1) * 28] = x_mean[0].reshape(28, 28)
        imsave(os.path.join(FLAGS.logdir,
                            'prior_predictive_map_frame_%d.png' % i), canvas)
        # plt.figure(figsize=(8, 10))
        # Xi, Yi = np.meshgrid(x_values, y_values)
        # plt.imshow(canvas, origin="upper")
        # plt.tight_layout()
        # plt.savefig()

  # Make the gifs
  if FLAGS.latent_dim == 2:
    os.system(
        'convert -delay 15 -loop 0 {0}/posterior_predictive_map_frame*png {0}/posterior_predictive.gif'
        .format(FLAGS.logdir))
    os.system(
        'convert -delay 15 -loop 0 {0}/prior_predictive_map_frame*png {0}/prior_predictive.gif'
        .format(FLAGS.logdir))


def main(_):
  if tf.gfile.Exists(FLAGS.logdir):
    tf.gfile.DeleteRecursively(FLAGS.logdir)
  tf.gfile.MakeDirs(FLAGS.logdir)
  train()


if __name__ == '__main__':
  tf.app.run()

———————————————————————————

第一次寫部落格，作為科研小白，才學疏淺，還請多多指教~

關於FV(Fisher Vector)和變分自編碼VAE（Variational Autoencoder）的原理簡介

關於FV(Fisher Vector)和變分自編碼VAE（Variational Autoencoder）的原理簡介

【Learning Notes】變分自編碼器（Variational Auto-Encoder，VAE）

從零上手變分自編碼器（VAE）

LearningNotes 變分自編碼 VariationalAutoEncoder VAE

VAE----變分自編碼器Keras實現

變分貝葉斯、變分自編碼與變分遷移

【自編碼】變分自編碼大雜燴

變分自編碼網路的實現

[深度學習]半監督學習、無監督學習之Variational Auto-Encoder變分自編碼器(附程式碼)

變分自編碼器VAE：原來是這麼一回事 | 附開原始碼

再談變分自編碼器VAE：從貝葉斯觀點出發

Autoencorder理解(5):VAE（Variational Auto-Encoder，變分自編碼器）

Variational Autoencoder（變分自編碼）

白話Variational Autoencoder（變分自編碼器）

【TensorFlow-windows】學習筆記六——變分自編碼器

變分自編碼（VAE）及程式碼解讀

VAE變分自編碼器的一點理解

變分自編碼器VAE：一步到位的聚類方案

你瞭解變分自編碼器嗎？請看這裡

HDU的一些二分和三分的一些題目（大部分模擬）

關於FV(Fisher Vector)和變分自編碼VAE（Variational Autoencoder）的原理簡介

相關推薦