How to visualize feature maps of Encoder? #250

vinhlaq · 2023-03-29T09:20:53Z

vinhlaq
Mar 29, 2023

Hi Lukas,
I appreciate your work on this project. But I got a question of why ViT+ResNetV2 is better than Pure ViT. And how the input images maintain the spatial information after processed by ResNetV2.
My approach is to access the encoder and visualize the feature map of the last layer of ResNetV2. And also visualize the feature map of the last layer of ViT to have a better insight.
Could you provide me some codes to plot the feature maps for an image as input?

lukas-blecher · 2023-03-31T15:00:03Z

lukas-blecher
Mar 31, 2023
Maintainer

I'm sure a pure ViT is perfectly useable but I found the hybrid version to be more stable during training.
Another factor is that for smaller model sizes the hybrid versions actually perform better (see Appendix C in the ViT paper)
The images are actually downscaled by the ResNet, but always by the same factor so it’s no problem for the ViT.

Visualising the resnet embeddings seems reasonable but I don’t have code for that I could share. For the ViT, it’s the attention mask you want to see. The model will most likely only pay attention to the current character in the sequence. I don’t think it’s very interesting to do either but I won’t stop you :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to visualize feature maps of Encoder? #250

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to visualize feature maps of Encoder? #250

vinhlaq Mar 29, 2023

Replies: 1 comment

lukas-blecher Mar 31, 2023 Maintainer

vinhlaq
Mar 29, 2023

lukas-blecher
Mar 31, 2023
Maintainer