Lena Gorelick, Moshe Blank, Eli Shechtman, Michal Irani and Ronen Basri
Appeared first in the Tenth IEEE International Conference on Computer Vision (ICCV), 2005
Abstract
Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated
motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a
recent approach by Gorelick et. al. for analyzing 2D shapes and generalize it to deal with volumetric space-time action
shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local
space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action
recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in
(but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method
to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the
performance of an action and low quality video.
NEW! The PAMI paper (full version, updated results) in PDF (2MB) format (BibTeX).
Updated database - including original silhouette sequences and their aligned version, as well as the robustness sequences, can be found below.
The ICCV paper (shorter version) in PDF (2MB) format (BibTeX).
Poisson features
We use the solution of the Poisson equation to extract several space time features. In the table below we demonstrate these
features for three sequences of different actions. The first two columns show the original video sequence and the extracted
foreground mask. The third column shows the solutions of the Poisson equation, color-coded from blue (low values) to red
(high values). The last three columns show the Space-Time ''saliency'', ''plateness'' and ''stickness'' features that we
use. See the paper for details. Click the images below to play the full video sequences.
Input Sequence
Foreground mask
Solution of Poisson eq.
Space-Time ''Saliency''
Measure of ''Plateness''
Measure of ''Stickness''
Experimental Results
In the paper we report results for four experiments: action clustering, action recognition, robustness experiments
and action detection. Here we show results of last three.
Action Recognition:
We collected a database of 90 low-resolution (180 x 144, deinterlaced 50 fps) video sequences showing nine different people, each performing 10 natural actions such as run, walk, skip,
jumping-jack (or shortly jack), jump-forward-on-two-legs (or jump), jump-in-place-on-two-legs (or pjump), gallopsideways (or side), wave-two-hands (or wave2), waveone- hand (or wave1), or bend.
In order to treat both the periodic and nonperiodic actions in the same framework as well as to compensate for different length of periods, we used a sliding window in time to extract space-time cubes, each having eight frames with an overlap of four frames between the consecutive space-time cubes.
Below we summarize our recognition rates in "leave-one-sequence-out" classification experiments for both complete sequences and sub-sequences .
Complete Sequences
Sub-Sequences
Method
% Correct
# Actions used
Blank et.al. ICCV 2005
100%
9 (not including skip)
Gorelick et.al. tPAMI 2007
100%
10
Method
% Correct
# Actions used
# Frames in sliding window
Blank et.al. ICCV 2005
99.64%
9 (not including skip)
10 frames in jumps of 5 frames, total 549 ST-cubes
Gorelick et.al. tPAMI 2007
97.83%
10
8 frames in jumps of 4 frames, total 923 ST-cubes
Robustness Experiments:
In this experiment we demonstrate the robustness of our method to high irregularities in the performance of an action.
We collected ten test video sequences of people walking in various difficult scenarios in front of different non-uniform
backgrounds (see the sequences and their foreground masks below). We show that our approach has relatively low sensitivity
to partial occlusions, non-rigid deformations and other defects in the extracted space-time shape.
Click the images below to play the full video sequences.
Walk with a dog
Swinging a bag
Walk in a skirt
Occluded feet
Occluded by a "pole"
Moonwalk
Limp Walk
Walk with knees up
Walk with a briefcase
Normal walk
Experiment results: The table below shows for each of the test sequences the first and second best choices and
their distances as well as the median distance to all the actions in our database. The test sequences are sorted
by the distance to their first best chosen action. All the sequences were classified as "walk".
Test Sequence
1st best
2nd best
Median
Normal walk
walk
5.6
run
8.2
11.2
Walking in a skirt
walk
5.6
side
8.1
9.9
Carrying briefcase
walk
6.6
side
8.5
10.4
Limping man
walk
7.0
skip
8.8
10.3
Occluded Legs
walk
8.2
skip
11.0
11.3
Knees Up
walk
8.3
side
9.6
10.1
Walking with a dog
walk
8.4
run
9.9
11.4
Sleepwalking
walk
8.4
run
9.8
12.1
Swinging a bag
walk
9.6
side
11.1
12.9
Occluded by a "pole"
walk
10.6
jack
11.6
12.5
Moreover we demonstrate the robustness of our method to substential changes in viewpoint. For this purpose we
collected ten additional sequences, each showing the "walk" action captured from a different viewpoint
(varying between 0° and 81° relative to the image plane with steps of 9°). Note, that sequences with
angles approaching 90 degrees contain significant changes in scale within the sequence.
All sequences with viewpoints between 0° and 54° were classified correctly with a large relative gap between the first (true) and
the second closest actions (see table below). For larger viewpoints a gradual deterioration occurs. This demonstrates the robustness
of our method to relatively large variations in viewpoint.
Test Sequence
1st best
2nd best
Median
Walking in 0°
walk
8.3
run
10.8
12.6
Walking in 9°
walk
7.9
side
9.9
12.2
Walking in 18°
walk
8.2
side
10.2
12.1
Walking in 27°
walk
8.2
side
9.7
11.5
Walking in 36°
walk
8.3
side
10.3
11.7
Walking in 45°
walk
9.0
side
10.7
11.6
Walking in 54°
walk
9.1
side
10.6
11.3
Walking in 63°
walk
11.1
side
11.6
12.9
Walking in 72°
walk
11.3
pjump
12.1
12.9
Walking in 81°
walk
12.6
pjump
12.7
13.3
Action Detection in a Ballet Movie
This experiment shows action detection on a movie sequence of a ballet dance, performed by the "Birmingham Royal Ballet" from the
"London Dance" website.
Original full video can be found also here (WMV format, 400KB). The task was to detect all instances of
the ''cabriole'' pa (the query) in the input video.
Click the images below to play the full video sequences.
The following files contain the original sequences used in the paper. Each ZIP file
contains uncompressed AVI videos of one of the actions performed by the subjects.
Total size of the database is ~340MB.
The extracted masks obtained by background subtraction: the file (Matlab 7 format, ~1700KB) contains both the original masks as well as the aligned ones (that were the actual inputs to our algorithm).
Also attached are the background sequences (~11MB) used for background subtraction (see documentation inside the zip file).
The extracted masks obtained by background subtraction: the file (Matlab 7 format, ~500KB) contains both the original masks as well as the aligned ones (that were the actual inputs to our algorithm).
@article{ActionsAsSpaceTimeShapes_pami07,
author = {Lena Gorelick and Moshe Blank and Eli Shechtman and Michal Irani and Ronen Basri},
title = {Actions as Space-Time Shapes},
journal = {Transactions on Pattern Analysis and Machine Intelligence},
volume = {29},
number = {12},
pages = {2247--2253},
month = {December},
year = {2007},
ee = {www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html}
}
The ICCV paper:
@inproceedings{ActionsAsSpaceTimeShapes_iccv05,
author = {Moshe Blank and Lena Gorelick and Eli Shechtman and Michal Irani and Ronen Basri},
title = {Actions as Space-Time Shapes},
booktitle = {The Tenth IEEE International Conference on Computer Vision (ICCV'05)},
pages = {1395-1402},
location = {Beiging},
year = {2005},
ee = {www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html},
}