Skip to content

Disentangled Variational AutoEncoder with PyTorch

Notifications You must be signed in to change notification settings

Near32/PYTORCH_VAE

Repository files navigation

Disentangled Variational AutoEncoders with PyTorch

This repository is an attempt at replicating some results presented in Irina Higgins et al.'s papers :

Requirements

In order to use DeepMind's "dSprites - Disentanglement testing Sprites dataset", you need to clone their repository and place it at the root of this one.

git clone https://github.com/deepmind/dsprites-dataset.git

XYS-latent dataset :

In order to use the XYS-latent dataset, you need to :

  1. download it here
  2. extract it at the root of this repository's folder.

Experiments

"dSprites-Disentanglement testing Sprites dataset" :

Using this dataset and the following hyperparameters :

  • Number of latent variables : 10
  • learning rate : 1e-5
  • "Temperature" hyperparameter Beta : 4e0
  • Number of layers of the decoder : 3
  • Base depth of the convolution/deconvolution layers : 32

Real images : Dreal1

With regards to the reconstruction images, every pair of rows consists of the real images on the first row and the reconstructed images on the second one. With regards to the latent space sampling, each latent are equally sampled between in the range [-3,3].

Epoch Reconstruction Latent Space
1 Dreconst1-1 Dgen1-1
10 Dreconst1-10 Dgen1-10
30 Dreconst1-30 Dgen1-30
50 Dreconst1-50 Dgen1-50
80 Dreconst1-80 Dgen1-80

Observations :

While the X,Y coordinates and S scale latent variables are clearly disentangled and reconstructed, the Sh shape latent variables is far from being reconstructed, let alone disentangled.

XYS-latent dataset :

Using this dataset and the following hyperparameters :

  • Number of latent variables : 10
  • learning rate : 1e-5
  • "Temperature" hyperparameter Beta : 5e3
  • Number of layers of the decoder : 5
  • Base depth of the convolution/deconvolution layers : 32
  • Stacked architecture : [x]

Real images : real1

Considering one column, every three row contains :

  1. Full image.
  2. Right-eye patch extracted from the full image.
  3. Left-eye patch extracted from the full image.
Epoch Reconstruction Latent Space
1 reconst1-1 gen1-1
10 reconst1-10 gen1-10
30 reconst1-30 gen1-30
70 reconst1-70 gen1-70
100 reconst1-100 gen1-100

Observations :

The S-scale latent variable seems to have been clearly disentangled while the other two latent variables, X and Y coordinates of the gaze on the camera plane, seem to be requiring a finer level of details from the decoder to show good reconstructions. Further analysis show that those latent variables are also quite nicely disentangled eventhough it is difficult to see here.

Disclaimers

I do not own any rights on some of the datasets that have been used and experienced with, namely :

About

Disentangled Variational AutoEncoder with PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages