This is the project repository for the research study Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition presented by Motasem S. Alsawadi, Ghulam Muhammad and Miguel Rio.
(In reviewing process for publication)
BlazePose skeleton data from Kinetics dataset class 'playing violin'.
- Download Kinetics BlazePose Skeleton Dataset
- Download NTU-RGB+D BlazePose Skeleton Dataset
- Storage info
- Citation
- Acknowledgements
The Deepmind Kinetics human action dataset is the largest set with unconstrained action recognition samples. The 306,245 videos provided by the Kinetics dataset are obtained from YouTube classified into 400 different action classes. Due to the variability of the duration of each clip, a fixed duration of 300 frames has been proposed. Therefore, if any video clip has less than 300 frames, we repeat the initial frames until reach the amount needed. Otherwise, if the video clip exceeds the frame number, we ramdomly deleted the exceeding frames. An independent JSON file has been exported for each video sample. The Kinetics BlazePose skeleton dataset can be downloaded here.
The NTU-RGB+D dataset provides a total of 56,880 action clips performing 60 different actions classified into three major groups: daily actions, health-related actions, and mutual actions. Forty participants performed the test action samples. Each sample has been captured with 3 different cameras simultaneously located at the same height but different angles. We set the duration of the video clips to 300 frames using the method explain in Section 1. The NTU-RGB+D BlazePose Skeleton dataset can be downloaded here.
The Kinetics and NTU-RGB+D BlazePose skeleton dataset are provided as .zip format.
The compressed file size of the Kinetics BlazePose Skeleton dataset is 6.7 GB. On the other hand, the uncompressed format has a size of 21.6 GB of storage.
The compressed file size of the NTU-RGB+D BlazePose Skeleton dataset is 1.62 GB. When uncompressed, this dataset has a size of 5.47 GB of storage.
When our paper is published, we will include the citation information in this section.
We utilized the Pose API to extract the skeleton information of the video datasets provided in MediaPipe