1. 程式人生 > >Auto-encoding Variational Bayes 閱讀筆記

Auto-encoding Variational Bayes 閱讀筆記

Notation
  • pθ(z|x): intractable posterior
  • pθ(x|z): probabilistic decoder
  • qϕ(z|x): recognition model, variational approximation to pθ(z|x), also regarded as a probabilistic encoder
  • pθ(z)pθ(x|z): generative model
  • ϕ: variational parameters
  • θ: generative parameters
Abbreviation
  • SGVB: Stochastic Gradient Variational Bayes
  • AEVB: auto-encoding VB
  • ML: maximum likelihood
  • MAP: maximum a posteriori

Motivation

  • Problem

    • How to perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distribution
      pθ(z|x) and large datasets?
  • Existing Solution and Difficulty

    • VB: involves the optimization of an approximation to the intractable posterior
    • mean-field: requires analytical solutions of expectations w.r.t. the approximate posterior, which are also intractable in the general case

Contribution of this paper

  • (1) SGVB estimator: an estimator of the variational lower bound
    • yielded by a reparameterization of the variational lower bound
    • simple & differentiable & unbiased
    • straightforwad to optimize using standard SG ascent techniques
  • (2) AEVB algorithm
    • using SGVB to optimize a recognition model that allows us to perform very efficient approximate posterior inference using simple ancestral sampling, which in turn allows us to efficiently learn the model parameters, without the need of expensive iterative inference schemes (such as MCMC) per datapoint.
    • condition: i.i.d. datasets X={x(i)}i=1N & continuous latent variable z per datapoint

Methodology

assumption
  • directed graphical models with continuous latent variables
  • i.i.d. dataset with latent variables per datapoint
    • where we like to perform
      • ML or MAP inference on the (global) paramters θ
      • variational inference on the latent variable z
  • pθ(z) and pθ(x|z): both PDFs are differentiable almost everywhere w.r.t. both θ and z
target case
  • intractability
    • pθ(x)=pθ(z)pθ(x|z)dz: so we cannot evaluate or differentiate it.
    • pθ(z|x)=pθ(x|z)pθ(z)pθ(x): so the EM algorithm cannot be used.
    • the required integrals for any reasonable mean-field VB algorithm: so the VB algorithm cannot be used.
    • in cases of moderately complicated likelihood function, e.g. in a neural network with a nonlinear hidden layer
  • a large dataset
    • batch optimization is too costly => minibatch or single datapoints
    • sampling bases solutions are too slow, e.g. Monte Carlo EM, since it involves a typically expensive sampling loop per datapoint.
solution and application
  • efficient approximate ML or MAP estimation for θ (Full): Appendix F
    • allow us to mimic the hidden random process and generate artificial data that resemble the real data
  • efficient approximate posterior inference pθ(z|x) for a choice of θ
    • useful for coding or data representation tasks
  • efficient approximate marginal inference of x: Appendix D
    • allow us to perform all kinds of inference tasks where p(x) is required, such as image denoising, inpainting, and super-resolution.

1. derivation of the variational bound

logpθ(x(1),x(N))=i=1Nlogpθ(x(i))
Here we use xi to represent