Actions as Space-Time Shapes

Lena Gorelick, Moshe Blank, Eli Shechtman, Michal Irani and Ronen Basri

Appeared first in the Tenth IEEE International Conference on Computer Vision (ICCV), 2005

Abstract

Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et. al. for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.

NEW! The PAMI paper (full version, updated results) in PDF (2MB) format (BibTeX).
Updated database - including original silhouette sequences and their aligned version, as well as the robustness sequences, can be found below.

The ICCV paper (shorter version) in PDF (2MB) format (BibTeX).

Poisson features

We use the solution of the Poisson equation to extract several space time features. In the table below we demonstrate these features for three sequences of different actions. The first two columns show the original video sequence and the extracted foreground mask. The third column shows the solutions of the Poisson equation, color-coded from blue (low values) to red (high values). The last three columns show the Space-Time ''saliency'', ''plateness'' and ''stickness'' features that we use. See the paper for details. Click the images below to play the full video sequences.

Input Sequence	Foreground mask	Solution of Poisson eq.	Space-Time ''Saliency''	Measure of ''Plateness''	Measure of ''Stickness''

Experimental Results

In the paper we report results for four experiments: action clustering, action recognition, robustness experiments and action detection. Here we show results of last three.

Action Recognition:

We collected a database of 90 low-resolution (180 x 144, deinterlaced 50 fps) video sequences showing nine different people, each performing 10 natural actions such as “run,” “walk,” “skip,” “jumping-jack” (or shortly “jack”), “jump-forward-on-two-legs” (or “jump”), “jump-in-place-on-two-legs” (or “pjump”), “gallopsideways” (or “side”), “wave-two-hands” (or “wave2”), “waveone- hand” (or “wave1”), or “bend.”

In order to treat both the periodic and nonperiodic actions in the same framework as well as to compensate for different length of periods, we used a sliding window in time to extract space-time cubes, each having eight frames with an overlap of four frames between the consecutive space-time cubes.

Below we summarize our recognition rates in "leave-one-sequence-out" classification experiments for both complete sequences and sub-sequences .

Complete Sequences

Sub-Sequences

Method	% Correct	# Actions used
Blank et.al. ICCV 2005	100%	9 (not including skip)
Gorelick et.al. tPAMI 2007	100%	10

Method	% Correct	# Actions used	# Frames in sliding window
Blank et.al. ICCV 2005	99.64%	9 (not including skip)	10 frames in jumps of 5 frames, total 549 ST-cubes
Gorelick et.al. tPAMI 2007	97.83%	10	8 frames in jumps of 4 frames, total 923 ST-cubes

Robustness Experiments:

In this experiment we demonstrate the robustness of our method to high irregularities in the performance of an action. We collected ten test video sequences of people walking in various difficult scenarios in front of different non-uniform backgrounds (see the sequences and their foreground masks below). We show that our approach has relatively low sensitivity to partial occlusions, non-rigid deformations and other defects in the extracted space-time shape.
Click the images below to play the full video sequences.

Walk with a dog	Swinging a bag	Walk in a skirt	Occluded feet	Occluded by a "pole"

Moonwalk	Limp Walk	Walk with knees up	Walk with a briefcase	Normal walk

Experiment results: The table below shows for each of the test sequences the first and second best choices and their distances as well as the median distance to all the actions in our database. The test sequences are sorted by the distance to their first best chosen action. All the sequences were classified as "walk".

*Test Sequence*	*1^st best*		*2^nd best*		*Median*
Normal walk	walk	5.6	run	8.2	11.2
Walking in a skirt	walk	5.6	side	8.1	9.9
Carrying briefcase	walk	6.6	side	8.5	10.4
Limping man	walk	7.0	skip	8.8	10.3
Occluded Legs	walk	8.2	skip	11.0	11.3
Knees Up	walk	8.3	side	9.6	10.1
Walking with a dog	walk	8.4	run	9.9	11.4
Sleepwalking	walk	8.4	run	9.8	12.1
Swinging a bag	walk	9.6	side	11.1	12.9
Occluded by a "pole"	walk	10.6	jack	11.6	12.5

Moreover we demonstrate the robustness of our method to substential changes in viewpoint. For this purpose we collected ten additional sequences, each showing the "walk" action captured from a different viewpoint (varying between 0° and 81° relative to the image plane with steps of 9°). Note, that sequences with angles approaching 90 degrees contain significant changes in scale within the sequence. All sequences with viewpoints between 0° and 54° were classified correctly with a large relative gap between the first (true) and the second closest actions (see table below). For larger viewpoints a gradual deterioration occurs. This demonstrates the robustness of our method to relatively large variations in viewpoint.

*Test Sequence*	*1^st best*		*2^nd best*		*Median*
Walking in 0°	walk	8.3	run	10.8	12.6
Walking in 9°	walk	7.9	side	9.9	12.2
Walking in 18°	walk	8.2	side	10.2	12.1
Walking in 27°	walk	8.2	side	9.7	11.5
Walking in 36°	walk	8.3	side	10.3	11.7
Walking in 45°	walk	9.0	side	10.7	11.6
Walking in 54°	walk	9.1	side	10.6	11.3
Walking in 63°	walk	11.1	side	11.6	12.9
Walking in 72°	walk	11.3	pjump	12.1	12.9
Walking in 81°	walk	12.6	pjump	12.7	13.3

Action Detection in a Ballet Movie

This experiment shows action detection on a movie sequence of a ballet dance, performed by the "Birmingham Royal Ballet" from the "London Dance" website. Original full video can be found also here (WMV format, 400KB). The task was to detect all instances of the ''cabriole'' pa (the query) in the input video.
Click the images below to play the full video sequences.

Query	Results of detection

Classification Database

The following files contain the original sequences used in the paper. Each ZIP file contains uncompressed AVI videos of one of the actions performed by the subjects. Total size of the database is ~340MB.

The extracted masks obtained by background subtraction: the file (Matlab 7 format, ~1700KB) contains both the original masks as well as the aligned ones (that were the actual inputs to our algorithm). Also attached are the background sequences (~11MB) used for background subtraction (see documentation inside the zip file).

Robustness Database

The following zip files contain the original uncompressed AVI videos used in the robustness experiment.

The extracted masks obtained by background subtraction: the file (Matlab 7 format, ~500KB) contains both the original masks as well as the aligned ones (that were the actual inputs to our algorithm).

BibTeX

The PAMI paper:

@article{ActionsAsSpaceTimeShapes_pami07,
  author 	= {Lena Gorelick and Moshe Blank and Eli Shechtman and Michal Irani and Ronen Basri},
  title 	= {Actions as Space-Time Shapes},
  journal 	= {Transactions on Pattern Analysis and Machine Intelligence},
  volume 	= {29},
  number 	= {12},
  pages 	= {2247--2253},
  month 	= {December},
  year 		= {2007},
  ee 		= {www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html}
}

The ICCV paper:

	
@inproceedings{ActionsAsSpaceTimeShapes_iccv05,
  author	= {Moshe Blank and Lena Gorelick and Eli Shechtman and Michal Irani and Ronen Basri},
  title 	= {Actions as Space-Time Shapes},
  booktitle	= {The Tenth IEEE International Conference on Computer Vision (ICCV'05)},
  pages 	= {1395-1402},
  location	= {Beiging},
  year   	= {2005},
  ee    	= {www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html},
  }

Contact Details

For further details please contact the authors:

Lena Gorelick
Moshe Blank
Eli Shechtman

Page last updated: 24 Dec 2007