Hi,
Thanks for your efforts to build this great repo for audio-visual learning.
We recently have a study (HiCMAE) on self-supervised audio-visual emotion recognition and we hope it can be included in this awesome repo. Thanks very much!
Paper title: HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition.
Codes and models: https://github.com/sunlicai/HiCMAE.
TL;DR: HiCMAE presents an early endeavor to leverage large-scale self-supervised pre-training to address the dilemma of current supervised methods and achieves great success on 9 audio-visual emotion recognition datasets.
Best,
Licai