HopelessNet: Improving Hopenet

Hopenet is an accurate and easy to use head pose estimation network. It uses the 300W-LP dataset for training the models and have been tested on real data with good qualitative performance.

The original repository of Hopenet is GitHub For details about the method and quantitative results please check their CVPR Workshop paper.

Here I am trying to revisit this method and improve its performance, specifically for testing on AFLW2000 dataset.
I applied minor changes to the code so that I can work with in using PyTorch version 1.1 and Python3.

Better Training for Hopenet

The best reported results for AFLW2000 dataset, provided in the CVPRW paper (Table 1), are:
Yaw: 6.470, Pitch: 6.559, Roll: 5.436, and MAE: 6.155

As reported in the paper, to achieve this result, they used below settings:

Training Dataset: 300W-LP
Alpha: 2
Batch Size: 128
Learning Rate: 1e-5

Using the provided code, I tried similar settings.
Except for batch size for which I had to reduce to 64 due to the memory limitation of my GPU.
What I found was after few epochs, the test error starts raising.
To achieve a smoother error curve, I reduced the learning rate to 1e-6 and tried the training with different alpha values.

The best model I got so far was from alpha = 1 which performs as below on AFLW2000:
Yaw: 5.4517, Pitch: 6.3541, Roll: 5.3127, MAE: 5.7062
The snapshot of this model can be downloaded from models/hopenet_snapshot_a1.pkl.

Improve the Efficiency of the Model with HopeLessNet :D

The original Hopenet method uses a ResNet50 convnet which is considered to be a heavy weight and inefficient model, specifically to be used on embedded or mobile platform.
To mitigate this issue, we can think of replacing this module with a lighter network e.g. ResNet18, Squeezenet, or MobileNet.
An argument is now added to the train_hopenet.py and test_hopenet.py modules called "arch" which can change the base network's architecture to:

ResNet18
ResNet34
ResNet50
ResNet101
ResNet152
Squeezenet_1_0
Squeezenet_1_1
MobileNetV2

The best performing model with ResNet18 architecture (snapshot) achieves:
Yaw: 6.0897, Pitch: 6.9588, Roll: 6.0907, MAE: 6.3797

With MobileNetV2 architechture (snapshot) I could reach to:
Yaw: 7.3247, Pitch: 6.9425, Roll: 6.2106, MAE: 6.8259

And with Squeezenet_1_0 architechture (snapshot) we can get:
Yaw: 7.2015, Pitch: 7.9230, Roll: 6.8532, MAE: 7.3259

Lastly, Squeezenet_1_1 architechture (snapshot) could perform:
Yaw: 8.8815, Pitch: 7.4020, Roll: 7.1891, MAE: 7.8242

It is good to mention about HopeNet-Lite, which also adopted a MobileNet like architecture for HopeNet.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
code		code
models		models
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
conan-cruise.gif		conan-cruise.gif
output-amir.gif		output-amir.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HopelessNet: Improving Hopenet

Better Training for Hopenet

Improve the Efficiency of the Model with HopeLessNet :D

About

Releases

Packages

Languages

License

shahroudy/deep-head-pose

Folders and files

Latest commit

History

Repository files navigation

HopelessNet: Improving Hopenet

Better Training for Hopenet

Improve the Efficiency of the Model with HopeLessNet :D

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages