Variational Autoencoders Explained

阿新 • • 發佈：2018-12-29

Latent spaceÂ¶

You know every image of a digit should contain, well, a single digit. An input in $\mathbb{R}^{28×28}$ doesn’t explicitly contain that information. But it must reside somewhere... That somewhere is the latent space.

Photo by Samuel Zeller on Unsplash

You can think of the latent space as $\mathbb{R}^{k}$ where every vector contains $k$ pieces of essential information needed to draw an image. Let’s say the first dimension contains the number represented by the digit. The second dimension can be the width. The third - the angle. And so on.

We can think of the process that generated the images as a two steps process. First the person decides - consciously or not - all the attributes of the digit he’s going to draw. Next, these decisions transform into brushstrokes.

VAE tries to model this process: given an image $x$, we want to find at least one latent vector which is able to describe it; one vector that contains the instructions to generate $x$. Formulating it using the

law of total probability, we get $P(x) = \int P(x|z)P(z)dz$.

Let’s pour some intuition into the equation:

The integral means we should search over the entire latent space for candidates.
For every candidate $z$, we ask ourselves: can $x$ be generated using the instructions of $z$? Is $P(x|z)$ big enough? If, for instance, $z$ encodes the information that the digit is 7, then an image of 8 is impossible. An image of 1, however, might be possible, since 1 and 7 look similar.

We found a good $z$? Good! But wait a second… Is this $z$ even likely? Is $P(z)$ big enough? Let’s consider a given image of an upside down 7. A latent vector describing a similar looking 7 where the angle dimension is set to 180 degrees will be a perfect match. However, that $z$ is not likely, since usually digits are not drawn in a 180 degrees angle.

The VAE training objective is to maximize $P(x)$. We’ll model $P(x|z)$ using a multivariate Gaussian $\mathcal{N}(f(z), \sigma^2 \cdot I)$.

$f(z)$ will be modeled using a neural network. $\sigma$ is a hyperparameter that multiplies the identity matrix $I$.

You should keep in mind that $f$ is what we'll be using when generating new images using a trained model. Imposing a Gaussian distribution serves for training purposes only. If we'd use a Dirac delta function (i.e. $x = f(z)$ deterministically), we wouldn't be able to train the model using gradient descent!

Variational Autoencoders Explained

Latent spaceÂ¶

Variational Autoencoders Explained

Variational Autoencoders Explained in Detail

Extracting and composing robust features with denosing autoencoders 論文

Semi-supervised Segmentation of Optic Cup in Retinal Fundus Images Using Variational Autoencoder 論文筆記

The JVM Architecture Explained

Recursive Autoencoders（遞迴自動編碼器）

機器學習：VAE(Variational Autoencoder) 模型

Blockchain in Cybersecurity – Explained By Savaram Ravindra(網路安全中的區塊鏈 - 由Savaram Ravindra來為您解釋說明)

CS229 6.4 Neurons Networks Autoencoders and Sparsity

Python: new magic method explained

Model-View-Controller Explained in C++

VAE（auto-encoding variational bayes）淺析

[深度學習]半監督學習、無監督學習之Autoencoders自編碼器(附程式碼)

[work] VAE(Variational Autoencoder)的原理

【博觀而約取，深研而廣求】Researcher on Stochastic Process, Variational Inference, Computer Vision and Machine Learning.

[深度學習]半監督學習、無監督學習之Variational Auto-Encoder變分自編碼器(附程式碼)

Ethereum White Paper, Explained. Part 1

Neural Network Embeddings Explained

Node Package Manager (NPM) explained by directing a movie

Pandas Crosstab Explained

Variational Autoencoders Explained

Latent spaceÂ¶

相關推薦