Inner Rui

Rui Shu

About Blog GitHub Publications Smileyball Twitter

19 Mar 2018
Using a Bernoulli VAE on Real-Valued Observations

The Bernoulli observation VAE is supposed is used when one’s observed samples $x \in \sset{0, 1}^n$ are vectors of binary elements. However, I have, on occasion, seen people (and even papers) that apply Bernoulli observation VAEs to real-valued samples $x \in [0, 1]^n$. This will be a quick and dirty post going over whether this unholy marriage of Bernoulli VAE with real-valued samples is appropriate.

Background and Notation for Bernoulli VAE

Given an empirical distribution $\hat{p}(x)$ whose samples are binary $x \in \sset{0, 1}^n$, the VAE objective is

If $p_\theta(x \giv z)$ is furthermore a fully-factorized Bernoulli observation model, then the distribution can be expressed as

where $\pi: \Z \to [0, 1]^n$ is a neural network parameterized by $\theta$. As preparation for the next section, we shall—with a slight abuse of notation—also define

where $\pi \in [0, 1]^n$.

Applying Bernoulli VAE to Real-Valued Samples

Suppose we have a distribution over $r(\pi)$, and $\hat{p}(x)$ is in fact the marginalization of $r(\pi)p(x \giv \pi)$. This is the case for MNIST, where the real-valued samples are interpreted as observations of $\pi$. This allows us to construct the objective as

It turns out there is another equally valid lower bound

However, since $q_\phi(z \giv \pi)$ does not have access to $x$, it is unlikely to give a better approximation of $p_\theta(z \giv x)$ than the previous equation. Consequently, it is likely to be a looser bound (which can be verified empirically). A bit of tedious algebra shows that the objective is equivalent to

where the inner-most term is exactly the sum of element-wise cross-entropy terms, where each cross-entropy term is

Note that this is exactly the application of Bernoulli observation VAEs to real-valued samples. So long as the real-valued samples can be interpreted as the Bernoulli distribution parameters, then this lower bound is valid. However, as noted above, this lower bound tends to be looser.

End of post
Inner Rui

About Blog GitHub Publications Smileyball Twitter