An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
In BMVC 2022. arXiv Presentation video

Dataset downloads (labels only)

Installation

conda create -n videoai python=3.9
conda activate videoai
conda install pytorch==1.12.1 torchvision cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.12.1 torchvision cudatoolkit=11.3 -c pytorch
 

git clone --recurse-submodules https://github.com/kiyoon/verb_ambiguity
cd verb_ambiguity
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .

Optional: Pillow-SIMD and libjepg-turbo to improve dataloading performance.
Run this at the end of the installation:

conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip   uninstall -y         pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff

Running feature experiments using pre-extracted features

Download pre-extracted features.

Download EPIC-Kitchens-100 TSM features
Download EPIC-Kitchens-100 TSM feature neighbours (optional): Using this neighbour cache will reduce the preparation time of the training by skipping neighbour search.
Download Confusing-HMDB-102 TSM features

Exract in data/EPIC_KITCHENS_100 or data/hmdb51.
Run the training code. Change the dataset and exp_name variables to select different experiments.

#!/bin/bash
exp_root="$HOME/experiments"  # Experiment results will be saved here.

export CUDA_VISIBLE_DEVICES=0
num_gpus=1
export VAI_USE_NEIGHBOUR_CACHE=True     # Only for EPIC-Kitchens-100-SPMV. It will bypass neighbour search if the cache is available, otherwise it will run and cache the results.
export VAI_NUM_NEIGHBOURS=15
export VAI_PSEUDOLABEL_THR=0.1

subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR"           # Name subfolder as you like.

dataset=epic100_verb_features
#dataset=confusing_hmdb_102_features

exp_name="concat_RGB_flow_assume_negative"
#exp_name="concat_RGB_flow_weak_assume_negative"
#exp_name="concat_RGB_flow_binary_labelsmooth"
#exp_name="concat_RGB_flow_binary_negative_labelsmooth"
#exp_name="concat_RGB_flow_binary_focal"
#exp_name="concat_RGB_flow_entropy_maximise"
#exp_name="concat_RGB_flow_mask_binary_ce"
#exp_name="concat_RGB_flow_pseudo_single_binary_ce"

# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -c:d verbambig -M ch_beta.featuremodel -E $exp_name -c:e verbambig -S "$subfolder" #--wandb_project kiyoon_kim_verbambig

# Evaluating script
# -l -2 loads the best model (with the highest heldout validation accuracy)
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -c:d verbambig -M ch_beta.featuremodel -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -p #--wandb

Running feature extraction or end-to-end experiments.

Prepare the dataset

EPIC-Kitchens-100-SPMV

Download rgb_frames and flow_frames. script.
Extract tar files. RGB script, flow script.
Clone EPIC-Kitchens-100 annotations at data/EPIC_KITCHENS_100/epic-kitchens-100-annotations.
Gulp the dataset. First, generate flow annotations using this and use this to gulp.
Generate dataset split files. RGB_script, flow_script
Get TSM pre-trained models from EPIC-Kitchens Action Models, and save them into data/pretrained/epic100.
Download the multi-verb annotations at data/EPIC_KITCHENS_100/ek100-val-multiple-verbs-halfagree-halfconfident-include_original-20220427.csv.
data/EPIC_KITCHENS_100 directory should have five directories and one file: epic-kitchens-100-annotations, splits_gulp_flow, splits_gulp_rgb, gulp_flow, gulp_rgb, ek100-val-multiple-verbs-halfagree-halfconfident-include_original-20220427.csv.

Confusing-HMDB-102

Download HMDB-51 videos. script
Extract them into frames of images. script
Generate optical flow. script
Gulp the dataset. script (Use rgb and flow_onefolder modality, and --class_folder).
Generate dataset split files. script (Use --confusion 2) Or just download the splits.
data/hmdb51 directory must have at least four directories: confusing102_splits_gulp_flow, confusing102_splits_gulp_rgb, gulp_flow, gulp_rgb.

Putting all together,

# Install unrar, nvidia-docker
# Execute from the root directory of this repo.
# Don't run all of them together. Some things may not run 

GPU_arch=pascal  # pascal / turing / ampere

conda activate videoai
submodules/video_datasets_api/tools/hmdb/download_hmdb.sh data/hmdb51
submodules/video_datasets_api/tools/hmdb/hmdb_extract_frames.sh data/hmdb51/videos data/hmdb51/frames
submodules/video_datasets_api/tools/hmdb/extract_flow_multigpu.sh data/hmdb51/frames data/hmdb51/flow $GPU_arch 0
python submodules/video_datasets_api/tools/gulp_jpeg_dir.py data/hmdb51/frames data/hmdb51/gulp_rgb rgb --class_folder
python submodules/video_datasets_api/tools/gulp_jpeg_dir.py data/hmdb51/flow data/hmdb51/gulp_flow flow_onefolder --class_folder
python tools/datasets/generate_hmdb_splits.py data/hmdb51/gulp_rgb data/hmdb51/confusing102_splits_gulp_rgb data/hmdb51/testTrainM
ulti_7030_splits --mode gulp --confusion 2
python tools/datasets/generate_hmdb_splits.py data/hmdb51/gulp_rgb data/hmdb51/confusing102_splits_gulp_rgb data/hmdb51/testTrainM
ulti_7030_splits --mode gulp --confusion 2

Run training, evaluation and feature extraction.

#!/bin/bash

exp_root="$HOME/experiments"  # Experiment results will be saved here.

export CUDA_VISIBLE_DEVICES=0
num_gpus=1
export VAI_NUM_NEIGHBOURS=15
export VAI_PSEUDOLABEL_THR=0.1


## Choose dataset
#dataset=epic100_verb
dataset=confusing_hmdb_102
export VAI_SPLITNUM=1   # only for confusing_hmdb_102 dataset.

## Choose model (RGB or flow)
model="tsm_resnet50_nopartialbn"
#model="ch_epic100.tsm_resnet50_flow"

## Choose loss
## For feature extraction, use "ce"
exp_name="ce"
#exp_name="assume_negative"
#exp_name="weak_assume_negative"
#exp_name="binary_labelsmooth"
#exp_name="binary_negative_labelsmooth"
#exp_name="binary_focal"
#exp_name="entropy_maximise"
#exp_name="mask_binary_ce"
#exp_name="pseudo_single_binary_ce"


# Name subfolder as you like.
if [[ $dataset == "epic100_verb" ]]
then
    subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR"
    extra_args=()
else
    subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR,split=$VAI_SPLITNUM"
    extra_args=(-c:d verbambig)
fi

# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" ${extra_args[@]} #--wandb_project kiyoon_kim_verbambig

if [[ $dataset == "epic100_verb" ]]
then
# Evaluating script
# -l -2 loads the best model (with the highest heldout validation accuracy)
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -p ${extra_args[@]} #--wandb
else
    echo "For Confusing-HMDB-102, there is no evaluation script. See summary.csv file and get the best number per metric."
fi


if [[ $exp_name == "ce" ]]
then
# Extract features
# -l -2 loads the best model (with the highest heldout validation accuracy)
tools/run_singlenode.sh feature $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -s traindata_testmode ${extra_args[@]} #--wandb
tools/run_singlenode.sh feature $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -s val ${extra_args[@]} #--wandb
fi

Once features are extracted, copy to data/ directory and edit dataset_configs/ch_verbambig/epic100_verb_features.py or dataset_configs/ch_verbambig/confusing_hmdb_102_features.py to update the corresponding feature path.

Refer to the Running feature experiments using pre-extracted features section for running experiments using the features.

Citing the paper

If you find our work or code useful, please cite:

@inproceedings{kim2022ambiguity,
  author    = {Kiyoon Kim and Davide Moltisanti and Oisin Mac Aodha and Laura Sevilla-Lara},
  title     = {An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition},
  booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
  publisher = {{BMVA} Press},
  year      = {2022},
  url       = {https://bmvc2022.mpi-inf.mpg.de/0356.pdf}
}

Framework Used

This repository is a fork of PyVideoAI framework.
Learn how to use it with PyVideoAI-examples notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataset_configs		dataset_configs
exp_configs		exp_configs
model_configs		model_configs
pyvideoai		pyvideoai
submodules		submodules
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

Dataset downloads (labels only)

Installation

Running feature experiments using pre-extracted features

Running feature extraction or end-to-end experiments.

Prepare the dataset

EPIC-Kitchens-100-SPMV

Confusing-HMDB-102

Run training, evaluation and feature extraction.

Citing the paper

Framework Used

About

Releases 2

Packages

Languages

License

kiyoon/verb_ambiguity

Folders and files

Latest commit

History

Repository files navigation

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

Dataset downloads (labels only)

Installation

Running feature experiments using pre-extracted features

Running feature extraction or end-to-end experiments.

Prepare the dataset

EPIC-Kitchens-100-SPMV

Confusing-HMDB-102

Run training, evaluation and feature extraction.

Citing the paper

Framework Used

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages