CANAVER

Code for Multimodal Cross Attention Network for Audio Visual Emotion Recognition submiited to ICASSP 2023

Please download the video files of CREMA-D (https://github.com/CheyneyComputerScience/CREMA-D) and MSP-IMPROV (https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Improv.html) datasets. The files need to be converted to appropriate format, the helper code of which will be made available upon request.

The code available here can be considered as 4 distinct steps:

The audio classifier can be trained by running the file train_ft_wav2vec.py in the audio classifier folder. The windowed audio features as described in the paper can be extracted using the code get_wav2vec_features_hop.py. The feature extractor can only run once the model is trained.
The video feature extractor timesformer is to be trained using the repository (https://github.com/facebookresearch/TimeSformer) by running the file train_video.py. This is followed by the windowed feature extractor in the code get_timesformer_features_hop.py
Once the timesformer windowed features are available the GRU with self-attention can be trained by running the code video_GRU.py. The context enhanced GRU features for video is extracted by the code vid_get_GRU_features.py
Finally the multimodal block is trained using the code GRU_multimodal_classifier.py.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CREMA-D		CREMA-D
MSP_IMPROV		MSP_IMPROV
README.md		README.md

Provide feedback