- This is a Chainer [12] implementation of 3D hand pose estimation.
-
Our algorithm adopts the top-down pipeline that consists of
Detector
andPoseEstimator
namely:- Detector: MobileNetV2 [9] base SSD (insize=224x224) [4].
- Portions of the code are borrowed from ChainerCV project [8].
- Pose Estimator: MobileNetV2 base Pose Proposal Networks (insize=224x224) [10].
- Detector: MobileNetV2 [9] base SSD (insize=224x224) [4].
-
First,
Detector
is applied on the image to generate a set of bounding-boxes surround human hand. The input image is cropped by them and these cropped patches serve as input toPoseEstimator
. It estimates the 2D joint location [10] and 3D joint vector like [5], [7] for each hand.
$ cd path/to/this/README.md
$ tree -L 2 -d
.
├── docs
│ └── imgs
├── experiments
│ ├── docker
│ ├── notebooks
│ └── test_images
├── result
│ └── release
└── src
├── demo
├── detector
└── pose
$ cd /path/to/your/working/directory
$ git clone https://github.com/Idein/chainer-hand-pose.git
- For simplicity, we will use docker image of idein/chainer which includes Chainer, ChainerCV and other utilities with CUDA driver. This will save time setting development environment.
- Prior to training, let's download dataset.
- We provide various scripts that load hand dataest [1], [2], [3], [6], [7],[11], [13], [14], [15], [16] . Before training, you need download dataset by yourself. See docs/dataset_preparation.md to prepare dataset on your computer for our purpose.
- See docs/detector.md
- See docs/pose.md
- After training Detector and PoseEstimator, these results will be stored in
result
directory. We provide demo script to run inference with them. - You can also use our pre-trained model. See our release page.
- Just run
$ cd src
$ python3 demo.py ../result/release ../result/release
- You can also use docker(on Ubuntu machine with GPU).
build docker image from Dockerfile
$ cd path/to/root/of/repository
$ docker build -t hand_demo experiments/docker/demo/gpu/
After finished building the docker image, just run src/run_demo.sh
$ cd src
$ bash run_demo.sh
- [1] Dollár, Piotr et al. “Cascaded pose regression.” CVPR (2010).
- [2] Garcia-Hernando, Guillermo et al. “First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations.” CVPR (2017).
- [3] Gomez-Donoso, Francisco et al. “Large-scale Multiview 3D Hand Pose Dataset.” Image Vision Comput. (2017).
- [4] Liu, Wei et al. “SSD: Single Shot MultiBox Detector.” ECCV (2016).
- [5] Luo, Chenxu et al. “OriNet: A Fully Convolutional Network for 3D Human Pose Estimation.” BMVC (2018).
- [6] Mueller, Franziska et al. “Real-Time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor.” ICCV (2017).
- [7] Mueller, Franziska et al. “GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB.” CVPR (2017).
- [8] Niitani, Yusuke et al. “ChainerCV: a Library for Deep Learning in Computer Vision.” ACM Multimedia (2017).
- [9] Sandler, Mark et al. “MobileNetV2: Inverted Residuals and Linear Bottlenecks.” CVPR (2018)
- [10] Sekii, Taiki. “Pose Proposal Networks.” ECCV (2018).
- [11] Simon, Tomas et al. “Hand Keypoint Detection in Single Images Using Multiview Bootstrapping.” CVPR (2017).
- [12] Tokui, Seiya, et al. "Chainer: a next-generation open source framework for deep learning." NIPS (2015).
- [13] Tompson, Jonathan et al. “Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks.” ACM Trans. Graph. 33 (2014).
- [14] Zhang, Jiawei et al. “3D Hand Pose Tracking and Estimation Using Stereo Matching.” ArXiv (2016).
- [15] Zimmermann, Christian and Thomas Brox. “Learning to Estimate 3D Hand Pose from Single RGB Images.” ICCV (2017).
- [16] Zimmermann, Christian et al. “FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images.” ArXiv (2019).