Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails in finding tickets if a CNN is used #3

Open
paintception opened this issue Jan 14, 2020 · 11 comments
Open

Fails in finding tickets if a CNN is used #3

paintception opened this issue Jan 14, 2020 · 11 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@paintception
Copy link

Hi,
thanks for your nice repo!

Have you tested whether the CNNs are able to find winning tickets on Cifar10 and Cifar100?

I ran multiple experiments with most of the convolutional architectures you provide but I'm only able to find "winning tickets" when using an MLP on the Mnist dataset. When a CNN is used (no matter which one) the experiments of the original paper, e.g. on Cifar10, cannot be reproduced.

Any idea why this is happening?

@ffeng1996
Copy link

Same question:)

@rahulvigneswaran
Copy link
Owner

@paintception @ffeng1996 Sorry for the delayed response. The following are the winning tickets of Lenet5 over mnist, fashionmnist, cifar10 and cifar100.
combined_lenet5_mnist
combined_lenet5_fashionmnist
combined_lenet5_cifar10
combined_lenet5_cifar100

Thanks for pointing it out! For some reason, at specific weight percentages, the winning tickets are not being generated. Let me take a look and get back to you soon.

@rahulvigneswaran rahulvigneswaran added the bug Something isn't working label Jan 17, 2020
@rahulvigneswaran rahulvigneswaran self-assigned this Jan 17, 2020
@ffeng1996
Copy link

Thanks for replying. However, when I run the code for AlexNet, DenseNet-121, ResNet-18 and VGG-16. The pruning methods cannot find the winner tickets.

Thanks.

@ZhangXiao96
Copy link

ZhangXiao96 commented Jan 19, 2020

Actually for large models/datasets, you may need some tricks, such as learnig rate warmup or "late resetting". You can find the details in some papers [1, 2]. Hope this helps!

[1] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable
neural networks. In International Conference on Learning Representations, 2019. URL http://arxiv.org/abs/1803.03635.

[2] Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M Roy, and Michael Carbin. The lottery
ticket hypothesis at scale. March 2019. URL http://arxiv.org/abs/1903.01611.

By the way, nice repo~

@ffeng1996
Copy link

Thanks!

@rahulvigneswaran
Copy link
Owner

@ZhangXiao96 Thanks for the direction. Sorry that I am not able to reply promptly. I am busy with a few submissions. I will get back within a few days with a solution.

@JKDomoguen
Copy link

JKDomoguen commented Jan 30, 2020

Hi, very much appreciate your work.

Few clarrifications, I've noticed in the your code particuarly in the "prune_by_percentile" func (line 269-292) in main.py that you don't seem to distinguish global pruning* for deeper networks e.g. VGG 19,Resnet 18 but instead prune each layer at the same rate (i.e. 10%) which are designed for fully connected layers applied in MNIST. Thank you for your time

*global pruning is specifically discussed in Chapter 4 of the original paper.

@ZhangXiao96
Copy link

Hi, very much appreciate your work.

Few clarrifications, I've noticed in the your code particuarly in the "prune_by_percentile" func (line 269-292) in main.py that you don't seem to distinguish global pruning* for deeper networks e.g. VGG 19,Resnet 18 but instead prune each layer at the same rate (i.e. 10%) which are designed for fully connected layers applied in MNIST. Thank you for your time

*global pruning is specifically discussed in Chapter 4 of the original paper.

Hello, I think layerwised pruning is not especially designed for fully connected layers (see [1]). As you can see, we can keep the functional equivalence of a DNN by multiplying the weights of one layer by a number x and another layer by 1/x. However, this operation will change the results of global pruning, hence I believe layerwised pruning may be more reasonable. Just my opinions~

[1] Frankle et al., Stabilizing the Lottery Ticket Hypothesis

@jfrankle
Copy link

jfrankle commented Feb 5, 2020

In my experience, global pruning works best on deeper convnets. Despite the fact that layers could theoretically rescale in inconvenient ways, that doesn't seem to happen in practice. Meanwhile, the layer sizes are so different in deep networks that layerwise pruning will delete small layers while leaving many extra parameters in big layers.

@ZhangXiao96
Copy link

In my experience, global pruning works best on deeper convnets. Despite the fact that layers could theoretically rescale in inconvenient ways, that doesn't seem to happen in practice. Meanwhile, the layer sizes are so different in deep networks that layerwise pruning will delete small layers while leaving many extra parameters in big layers.

Thanks for your suggestions!

@rahulvigneswaran rahulvigneswaran removed their assignment Feb 10, 2020
@rahulvigneswaran rahulvigneswaran added the help wanted Extra attention is needed label Feb 10, 2020
@jamesoneill12
Copy link

jamesoneill12 commented Feb 19, 2020

combined_resnet18_cifar10

So this is what I got for CIFAR-10 using ResNet-18 with 256 batch size, 20 percent pruning for 20 pruning iterations over 60 epochs. My conclusion based on other work is that in order to stabilize the model compression you have to factor in the optimizer and architecture you are using, how to distribute the pruning (or more generally whatever compression method you are using) percentage per retraining step, how many retraining step you use, the capacity of the base model, the number of samples given for the dataset and the input-output dimensionality ratio.....basically everything :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants