Fails in finding tickets if a CNN is used #3

paintception · 2020-01-14T11:56:07Z

Hi,
thanks for your nice repo!

Have you tested whether the CNNs are able to find winning tickets on Cifar10 and Cifar100?

I ran multiple experiments with most of the convolutional architectures you provide but I'm only able to find "winning tickets" when using an MLP on the Mnist dataset. When a CNN is used (no matter which one) the experiments of the original paper, e.g. on Cifar10, cannot be reproduced.

Any idea why this is happening?

ffeng1996 · 2020-01-17T07:35:00Z

Same question:)

rahulvigneswaran · 2020-01-17T08:01:56Z

@paintception @ffeng1996 Sorry for the delayed response. The following are the winning tickets of Lenet5 over mnist, fashionmnist, cifar10 and cifar100.

Thanks for pointing it out! For some reason, at specific weight percentages, the winning tickets are not being generated. Let me take a look and get back to you soon.

ffeng1996 · 2020-01-17T10:51:26Z

Thanks for replying. However, when I run the code for AlexNet, DenseNet-121, ResNet-18 and VGG-16. The pruning methods cannot find the winner tickets.

Thanks.

ZhangXiao96 · 2020-01-19T09:58:06Z

Actually for large models/datasets, you may need some tricks, such as learnig rate warmup or "late resetting". You can find the details in some papers [1, 2]. Hope this helps!

[1] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable
neural networks. In International Conference on Learning Representations, 2019. URL http://arxiv.org/abs/1803.03635.

[2] Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M Roy, and Michael Carbin. The lottery
ticket hypothesis at scale. March 2019. URL http://arxiv.org/abs/1903.01611.

By the way, nice repo~

ffeng1996 · 2020-01-26T10:37:53Z

Thanks！

rahulvigneswaran · 2020-01-27T08:24:39Z

@ZhangXiao96 Thanks for the direction. Sorry that I am not able to reply promptly. I am busy with a few submissions. I will get back within a few days with a solution.

JKDomoguen · 2020-01-30T14:53:58Z

Hi, very much appreciate your work.

Few clarrifications, I've noticed in the your code particuarly in the "prune_by_percentile" func (line 269-292) in main.py that you don't seem to distinguish global pruning* for deeper networks e.g. VGG 19,Resnet 18 but instead prune each layer at the same rate (i.e. 10%) which are designed for fully connected layers applied in MNIST. Thank you for your time

*global pruning is specifically discussed in Chapter 4 of the original paper.

ZhangXiao96 · 2020-02-05T04:30:50Z

Hi, very much appreciate your work.

Few clarrifications, I've noticed in the your code particuarly in the "prune_by_percentile" func (line 269-292) in main.py that you don't seem to distinguish global pruning* for deeper networks e.g. VGG 19,Resnet 18 but instead prune each layer at the same rate (i.e. 10%) which are designed for fully connected layers applied in MNIST. Thank you for your time

*global pruning is specifically discussed in Chapter 4 of the original paper.

Hello, I think layerwised pruning is not especially designed for fully connected layers (see [1]). As you can see, we can keep the functional equivalence of a DNN by multiplying the weights of one layer by a number x and another layer by 1/x. However, this operation will change the results of global pruning, hence I believe layerwised pruning may be more reasonable. Just my opinions~

[1] Frankle et al., Stabilizing the Lottery Ticket Hypothesis

jfrankle · 2020-02-05T05:27:34Z

In my experience, global pruning works best on deeper convnets. Despite the fact that layers could theoretically rescale in inconvenient ways, that doesn't seem to happen in practice. Meanwhile, the layer sizes are so different in deep networks that layerwise pruning will delete small layers while leaving many extra parameters in big layers.

ZhangXiao96 · 2020-02-05T09:59:45Z

In my experience, global pruning works best on deeper convnets. Despite the fact that layers could theoretically rescale in inconvenient ways, that doesn't seem to happen in practice. Meanwhile, the layer sizes are so different in deep networks that layerwise pruning will delete small layers while leaving many extra parameters in big layers.

Thanks for your suggestions!

jamesoneill12 · 2020-02-19T12:53:11Z

So this is what I got for CIFAR-10 using ResNet-18 with 256 batch size, 20 percent pruning for 20 pruning iterations over 60 epochs. My conclusion based on other work is that in order to stabilize the model compression you have to factor in the optimizer and architecture you are using, how to distribute the pruning (or more generally whatever compression method you are using) percentage per retraining step, how many retraining step you use, the capacity of the base model, the number of samples given for the dataset and the input-output dimensionality ratio.....basically everything :)

rahulvigneswaran added the bug Something isn't working label Jan 17, 2020

rahulvigneswaran self-assigned this Jan 17, 2020

rahulvigneswaran removed their assignment Feb 10, 2020

rahulvigneswaran added the help wanted Extra attention is needed label Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails in finding tickets if a CNN is used #3

Fails in finding tickets if a CNN is used #3

paintception commented Jan 14, 2020

ffeng1996 commented Jan 17, 2020

rahulvigneswaran commented Jan 17, 2020

ffeng1996 commented Jan 17, 2020

ZhangXiao96 commented Jan 19, 2020 •

edited

Loading

ffeng1996 commented Jan 26, 2020

rahulvigneswaran commented Jan 27, 2020

JKDomoguen commented Jan 30, 2020 •

edited

Loading

ZhangXiao96 commented Feb 5, 2020

jfrankle commented Feb 5, 2020 •

edited

Loading

ZhangXiao96 commented Feb 5, 2020

jamesoneill12 commented Feb 19, 2020 •

edited

Loading

Fails in finding tickets if a CNN is used #3

Fails in finding tickets if a CNN is used #3

Comments

paintception commented Jan 14, 2020

ffeng1996 commented Jan 17, 2020

rahulvigneswaran commented Jan 17, 2020

ffeng1996 commented Jan 17, 2020

ZhangXiao96 commented Jan 19, 2020 • edited Loading

ffeng1996 commented Jan 26, 2020

rahulvigneswaran commented Jan 27, 2020

JKDomoguen commented Jan 30, 2020 • edited Loading

ZhangXiao96 commented Feb 5, 2020

jfrankle commented Feb 5, 2020 • edited Loading

ZhangXiao96 commented Feb 5, 2020

jamesoneill12 commented Feb 19, 2020 • edited Loading

ZhangXiao96 commented Jan 19, 2020 •

edited

Loading

JKDomoguen commented Jan 30, 2020 •

edited

Loading

jfrankle commented Feb 5, 2020 •

edited

Loading

jamesoneill12 commented Feb 19, 2020 •

edited

Loading