ICASSP 2024
See extract_feats.py for feature extraction examples. We currently support the following models:
- AV-HuBERT (ICLR 2022)
- RepLAI (NeurIPS 2022)
- Lee et al. (ICLR 2021), referred as AVBERT in this repo
- MAViL (NeurIPS 2023)
We also include handcrafted features to serve as baselines. Pull requests are welcome for adding more models.
Installation:
conda create -n av python=3.9 -y
conda activate av
pip install -r requirements.txtDownstream Task Evaluation:
python run_downstream.py -m train \
-u <upstream model name> \
-d <downstream task name> \
-s <feature type> \
--pooled_features_path <path to save features>
Researchers can also submit model code and weights to our submission platform to easily evaluate on the AV-SUPERB benchmark.
We expect two Python files to be submitted, expert.py, which implements the model forward pass and preprocessing functions for each of the two modalities, and hubconf.py, which downloads model weights.
Please refer to this example model and the submission platform for more details.
@article{tseng2023avsuperb,
title={AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models},
author={Yuan Tseng and Layne Berry and Yi-Ting Chen and I-Hsiang Chiu and Hsuan-Hao Lin and Max Liu and Puyuan Peng and Yi-Jen Shih and Hung-Yu Wang and Haibin Wu and Po-Yao Huang and Chun-Mao Lai and Shang-Wen Li and David Harwath and Yu Tsao and Shinji Watanabe and Abdelrahman Mohamed and Chi-Luen Feng and Hung-yi Lee},
journal={arXiv preprint arXiv:2309.10787},
year={2023}
}
AV-SUPERB is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
Using files and pretrained AV-HuBERT models under the upstream_models/vhubert folder requires accepting the terms in the AV-HuBERT license agreement listed in this file.
See LICENSE-APACHE, LICENSE-MIT, COPYRIGHT for details.
Source code is based on S3PRL.