The Role of ImageNet Classes in Fréchet Inception Distance
Tuomas Kynkäänniemi, Tero Karras, Miika Aittala, Timo Aila, Jaakko Lehtinen
Abstract: Fréchet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the ImageNet classifications that aligning the histograms of Top-$N$ classifications between sets of generated and real images can reduce FID substantially — without actually improving the quality of results. Thus we conclude that FID is prone to intentional or accidental distortions. As a practical example of an accidental distortion, we discuss a case where an ImageNet pre-trained FastGAN achieves a FID comparable to StyleGAN2, while being worse in terms of human evaluation.
We recommend using Anaconda. To create a virtual environment and install required packages run:
conda env create -f environment.yml
conda activate imagenet-classes-in-fid
This repository provides code for reproducing FID sensitivity heatmaps for individual images (Sec. 2) and probing the perceptual null space of FID by resampling features (Sec. 3).
To run the below code examples, you first need to prepare or download the 256x256
resolution FFHQ dataset in ZIP format. Help for preparing the dataset can found here. If automatic downloading of network pickles from Google Drive fails they can be manually downloaded from here.
FID sensitivity heatmaps for StyleGAN2-generated images in FFHQ (Fig. 3) can be generated with:
python generate_heatmaps.py --zip_path=data_path \
--network_pkl=https://drive.google.com/uc?id=119HvnQ5nwHl0_vUTEFWQNk4bwYjoXTrC \
--seeds=[107,540,386,780,544,879]
Running the command takes approximately 8 minutes with an NVIDIA Titan V GPU. Reference sensitivity heatmaps can be found from here. See python generate_heatmaps.py --help
for more options.
Note: Running this requires a GPU with at least 26 GB of memory. All fringe features of StyleGAN2-generated images in FFHQ can be matched with:
python run_resampling.py --zip_path=data_path \
--network_pkl=https://drive.google.com/uc?id=119HvnQ5nwHl0_vUTEFWQNk4bwYjoXTrC \
--feature_mode=pre_logits
This command can be used to replicate FFHQ pre-logits resampling results from Tab. 2. Reference output is (numbers may vary slightly):
It. 1/100000, loss = 4.95917
FID = 5.25
CLIP-FID = 2.75
Elapsed time: 154.13s
It. 500/100000, loss = 3.61191
FID = 3.96
CLIP-FID = 2.69
Elapsed time: 1789.39s
Additionally, resampling can be run in four different modes: pre_logits
, logits
, top_n
, and middle_n
. These modes respectively match all fringe features, Inception-V3 logits, Top-N classes, and Middle-N classes between real and generated images. See python run_resampling.py --help
for more options.
The code of this repository is released under the CC BY-NC-SA 4.0 license. This repository adapts code from StyleGAN2, StyleGAN2-ADA, and PyTorch Grad-CAM which are released under NVIDIA Source Code License and MIT License, respectively. The clip-vit_b32.pkl
is derived from the pre-trained CLIP ViT-B/32 by OpenAI which is originally shared under MIT License in GitHub repository.
@inproceedings{Kynkaanniemi2022,
author = {Tuomas Kynkäänniemi and
Tero Karras and
Miika Aittala and
Timo Aila and
Jaakko Lehtinen},
title = {The Role of ImageNet Classes in Fréchet Inception Distance},
booktitle = {Proc. ICLR},
year = {2023},
}
We thank Samuli Laine for helpful comments. This work was partially supported by the European Research Council (ERC Consolidator Grant 866435), and made use of computational resources provided by the Aalto Science-IT project and the Finnish IT Center for Science (CSC).