Parallel acceleration on both feature x and centre W. Setting: ResNet 50, batch size 864, feature dimension 512, float point 32, GPU 8P40 (24GB).
Parallel calculation by simple matrix partition. Setting: ResNet 50, batch size 8*64, feature dimension 512, float point 32, identity number 1 Million, GPU 8 * 1080ti (11GB). Communication cost: 1MB (feature x). Training speed: 800 samples/second.
Note: Replace train.py
with train_parall.py
in following examples if you want to use parallel acceleration.
- Install
MXNet
with GPU support (Python 2.7).
pip install mxnet-cu80 #or mxnet-cu90 or mxnet-cu100
- Clone the InsightFace repository. We call the directory insightface as
INSIGHTFACE_ROOT
.
git clone --recursive https://github.com/deepinsight/insightface.git
- Download the training set (
MS1MV2-Arcface
) and place it in$INSIGHTFACE_ROOT/datasets/
. Each training dataset includes the following 6 files:
faces_emore/
train.idx
train.rec
property
lfw.bin
cfp_fp.bin
agedb_30.bin
The first three files are the training dataset while the last three files are verification sets.
- Train deep face recognition models.
In this part, we assume you are in the directory
$INSIGHTFACE_ROOT/recognition/
.
Place and edit config file:
cp sample_config.py config.py
vim config.py # edit dataset path etc..
We give some examples below. Our experiments were conducted on the Tesla P40 GPU.
(1). Train ArcFace with LResNet100E-IR.
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network r100 --loss arcface --dataset emore
It will output verification results of LFW, CFP-FP and AgeDB-30 every 2000 batches. You can check all options in config.py. This model can achieve LFW 99.80+ and MegaFace 98.3%+.
(2). Train CosineFace with LResNet50E-IR.
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network r50 --loss cosface --dataset emore
(3). Train Softmax with MobileFaceNet.
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network y1 --loss softmax --dataset emore
(4). Fine-turn the above Softmax model with Triplet loss.
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network mnas05 --loss triplet --lr 0.005 --pretrained ./models/y1-softmax-emore,1
If you find ArcFace useful in your research, please consider to cite the following related papers:
@article{deng2018arcface,
title={ArcFace: Additive Angular Margin Loss for Deep Face Recognition},
author={Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos},
journal={CVPR},
year={2019}
}
This parallel acceleration for large-scale face recognition is also inspired by following works:
@article{debingzhang,
title={A distributed training solution for face recognition},
author={Zhang, Debing},
journal={DeepGlint},
year={2018}
}
@inproceedings{zhang2018accelerated,
title={Accelerated training for massive classification via dynamic class selection},
author={Zhang, Xingcheng and Yang, Lei and Yan, Junjie and Lin, Dahua},
booktitle={AAAI},
year={2018}
}