recognition

Parallel Acceleration on both x and W

Memory Consumption and Training Speed

Parallel acceleration on both feature x and centre W. Setting: ResNet 50, batch size 864, feature dimension 512, float point 32, GPU 8P40 (24GB).

Illustration of Main Steps

Parallel calculation by simple matrix partition. Setting: ResNet 50, batch size 8*64, feature dimension 512, float point 32, identity number 1 Million, GPU 8 * 1080ti (11GB). Communication cost: 1MB (feature x). Training speed: 800 samples/second.

Note: Replace train.py with train_parall.py in following examples if you want to use parallel acceleration.

Model Training

Install MXNet with GPU support (Python 2.7).

pip install mxnet-cu80 #or mxnet-cu90 or mxnet-cu100

Clone the InsightFace repository. We call the directory insightface as INSIGHTFACE_ROOT.

git clone --recursive https://github.com/deepinsight/insightface.git

Download the training set (MS1MV2-Arcface) and place it in $INSIGHTFACE_ROOT/datasets/. Each training dataset includes the following 6 files:

    faces_emore/
       train.idx
       train.rec
       property
       lfw.bin
       cfp_fp.bin
       agedb_30.bin

The first three files are the training dataset while the last three files are verification sets.

Train deep face recognition models. In this part, we assume you are in the directory $INSIGHTFACE_ROOT/recognition/.

Place and edit config file:

cp sample_config.py config.py
vim config.py # edit dataset path etc..

We give some examples below. Our experiments were conducted on the Tesla P40 GPU.

(1). Train ArcFace with LResNet100E-IR.

CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network r100 --loss arcface --dataset emore

It will output verification results of LFW, CFP-FP and AgeDB-30 every 2000 batches. You can check all options in config.py. This model can achieve LFW 99.80+ and MegaFace 98.3%+.

(2). Train CosineFace with LResNet50E-IR.

CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network r50 --loss cosface --dataset emore

(3). Train Softmax with MobileFaceNet.

CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network y1 --loss softmax --dataset emore

(4). Fine-turn the above Softmax model with Triplet loss.

CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train.py --network mnas05 --loss triplet --lr 0.005 --pretrained ./models/y1-softmax-emore,1

Citation

If you find ArcFace useful in your research, please consider to cite the following related papers:

@article{deng2018arcface,
title={ArcFace: Additive Angular Margin Loss for Deep Face Recognition},
author={Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos},
journal={CVPR},
year={2019}
}

This parallel acceleration for large-scale face recognition is also inspired by following works:

@article{debingzhang,
  title={A distributed training solution for face recognition},
  author={Zhang, Debing},
  journal={DeepGlint},
  year={2018}
}

@inproceedings{zhang2018accelerated,
  title={Accelerated training for massive classification via dynamic class selection},
  author={Zhang, Xingcheng and Yang, Lei and Yan, Junjie and Lin, Dahua},
  booktitle={AAAI},
  year={2018}
}

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
eval		eval
symbol		symbol
README.md		README.md
image_iter.py		image_iter.py
metric.py		metric.py
parall_module_local_v1.py		parall_module_local_v1.py
sample_config.py		sample_config.py
train.py		train.py
train_parall.py		train_parall.py
triplet_image_iter.py		triplet_image_iter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recognition

recognition

README.md

Parallel Acceleration on both x and W

Memory Consumption and Training Speed

Illustration of Main Steps

Model Training

Citation

Files

recognition

Directory actions

More options

Directory actions

More options

Latest commit

History

recognition

Folders and files

parent directory

README.md

Parallel Acceleration on both x and W

Memory Consumption and Training Speed

Illustration of Main Steps

Model Training

Citation