stereoplegic
's Collections
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper
•
2310.17157
•
Published
•
12
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
Transformers
Paper
•
2305.15805
•
Published
•
1
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt
Paper
•
2305.11186
•
Published
•
1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper
•
2110.07560
•
Published
•
1
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval
Paper
•
2204.02292
•
Published
•
1
Pruning Adversarially Robust Neural Networks without Adversarial
Examples
Paper
•
2210.04311
•
Published
•
1
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
Paper
•
2305.18403
•
Published
•
2
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text
Sequence-to-Sequence Modeling
Paper
•
2305.08285
•
Published
•
1
Multi-Head Adapter Routing for Cross-Task Generalization
Paper
•
2211.03831
•
Published
•
2
Improving Visual Prompt Tuning for Self-supervised Vision Transformers
Paper
•
2306.05067
•
Published
•
2
Dynamic Token Pruning in Plain Vision Transformers for Semantic
Segmentation
Paper
•
2308.01045
•
Published
•
1
The Information Pathways Hypothesis: Transformers are Dynamic
Self-Ensembles
Paper
•
2306.01705
•
Published
•
1
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Paper
•
2303.11525
•
Published
•
1
How do neurons operate on sparse distributed representations? A
mathematical theory of sparsity, neurons and active dendrites
Paper
•
1601.00720
•
Published
•
1
Scalable Training of Artificial Neural Networks with Adaptive Sparse
Connectivity inspired by Network Science
Paper
•
1707.04780
•
Published
•
1
Quick and Robust Feature Selection: the Strength of Energy-efficient
Sparse Training for Autoencoders
Paper
•
2012.00560
•
Published
•
1
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper
•
2310.06927
•
Published
•
14
How Well Do Sparse Imagenet Models Transfer?
Paper
•
2111.13445
•
Published
•
1
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for
Large Language Models
Paper
•
2203.07259
•
Published
•
3
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper
•
2301.00774
•
Published
•
3
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper
•
2310.17752
•
Published
•
12
LoRAShear: Efficient Large Language Model Structured Pruning and
Knowledge Recovery
Paper
•
2310.18356
•
Published
•
22
Continual Learning via Neural Pruning
Paper
•
1903.04476
•
Published
•
1
A Survey on Model Compression for Large Language Models
Paper
•
2308.07633
•
Published
•
3
A Simple and Effective Pruning Approach for Large Language Models
Paper
•
2306.11695
•
Published
•
3
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Paper
•
2305.01610
•
Published
•
2
XPrompt: Exploring the Extreme of Prompt Tuning
Paper
•
2210.04457
•
Published
•
1
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
Models
Paper
•
2303.10464
•
Published
•
1
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models
Paper
•
2111.00160
•
Published
•
1
Only 5\% Attention Is All You Need: Efficient Long-range Document-level
Neural Machine Translation
Paper
•
2309.14174
•
Published
•
1
Beyond Attentive Tokens: Incorporating Token Importance and Diversity
for Efficient Vision Transformers
Paper
•
2211.11315
•
Published
•
1
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
Pruning
Paper
•
2310.06694
•
Published
•
4
Compresso: Structured Pruning with Collaborative Prompting Learns
Compact Large Language Models
Paper
•
2310.05015
•
Published
•
1
Can pruning make Large Language Models more efficient?
Paper
•
2310.04573
•
Published
•
1
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Paper
•
2310.01382
•
Published
•
1
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
Models
Paper
•
2305.17651
•
Published
•
1
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech
Representations
Paper
•
2203.16965
•
Published
•
1
Task-Agnostic Structured Pruning of Speech Representation Models
Paper
•
2306.01385
•
Published
•
1
Recycle-and-Distill: Universal Compression Strategy for
Transformer-based Speech SSL Models with Attention Map Reusing and Masking
Distillation
Paper
•
2305.11685
•
Published
•
2
The Emergence of Essential Sparsity in Large Pre-trained Models: The
Weights that Matter
Paper
•
2306.03805
•
Published
•
1
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning
Paper
•
2205.11005
•
Published
•
1
Beyond Size: How Gradients Shape Pruning Decisions in Large Language
Models
Paper
•
2311.04902
•
Published
•
1
Leveraging Structured Pruning of Convolutional Neural Networks
Paper
•
2206.06247
•
Published
•
1
You are caught stealing my winning lottery ticket! Making a lottery
ticket claim its ownership
Paper
•
2111.00162
•
Published
•
1
Sparse then Prune: Toward Efficient Vision Transformers
Paper
•
2307.11988
•
Published
•
1
SHARP: Sparsity and Hidden Activation RePlay for Neuro-Inspired
Continual Learning
Paper
•
2305.18563
•
Published
•
1
Incremental Task Learning with Incremental Rank Updates
Paper
•
2207.09074
•
Published
•
1
On the Soft-Subnetwork for Few-shot Class Incremental Learning
Paper
•
2209.07529
•
Published
•
1
Forget-free Continual Learning with Soft-Winning SubNetworks
Paper
•
2303.14962
•
Published
•
1
Exclusive Supermask Subnetwork Training for Continual Learning
Paper
•
2210.10209
•
Published
•
1
Continual Task Allocation in Meta-Policy Network via Sparse Prompting
Paper
•
2305.18444
•
Published
•
1
SparCL: Sparse Continual Learning on the Edge
Paper
•
2209.09476
•
Published
•
1
Continual Learning with Dynamic Sparse Training: Exploring Algorithms
for Effective Model Updates
Paper
•
2308.14831
•
Published
•
1
Dynamic Sparse Training with Structured Sparsity
Paper
•
2305.02299
•
Published
•
1
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Paper
•
2308.02060
•
Published
•
1
Dynamic Sparse Training via Balancing the Exploration-Exploitation
Trade-off
Paper
•
2211.16667
•
Published
•
1
HyperSparse Neural Networks: Shifting Exploration to Exploitation
through Adaptive Regularization
Paper
•
2308.07163
•
Published
•
1
Is Complexity Required for Neural Network Pruning? A Case Study on
Global Magnitude Pruning
Paper
•
2209.14624
•
Published
•
1
End-to-End Neural Network Compression via ell_1{ell_2}
Regularized Latency Surrogates
Paper
•
2306.05785
•
Published
•
1
Fire Together Wire Together: A Dynamic Pruning Approach with
Self-Supervised Mask Prediction
Paper
•
2110.08232
•
Published
•
1
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from
Scratch
Paper
•
2309.14157
•
Published
•
1
Weight-dependent Gates for Network Pruning
Paper
•
2007.02066
•
Published
•
1
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Paper
•
2301.11063
•
Published
•
1
Soft Masking for Cost-Constrained Channel Pruning
Paper
•
2211.02206
•
Published
•
1
Group channel pruning and spatial attention distilling for object
detection
Paper
•
2306.01526
•
Published
•
1
Structured Pruning Learns Compact and Accurate Models
Paper
•
2204.00408
•
Published
•
1
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency
with Slenderized Multi-exit Language Models
Paper
•
2210.15523
•
Published
•
1
Latency Adjustable Transformer Encoder for Language Understanding
Paper
•
2201.03327
•
Published
•
1
Learned Token Pruning for Transformers
Paper
•
2107.00910
•
Published
•
1
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
Smaller and more Accurate NLP Models
Paper
•
2010.03688
•
Published
•
1
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention
Graph in Pre-Trained Transformers
Paper
•
2305.17328
•
Published
•
2
Pruning Pre-trained Language Models Without Fine-Tuning
Paper
•
2210.06210
•
Published
•
1
Frustratingly Simple Memory Efficiency for Pre-trained Language Models
via Dynamic Embedding Pruning
Paper
•
2309.08708
•
Published
•
3
Are Sixteen Heads Really Better than One?
Paper
•
1905.10650
•
Published
•
2
SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via
Jointly Architecture Searching and Parameter Pruning
Paper
•
2207.03677
•
Published
•
1
Generative Model for Models: Rapid DNN Customization for Diverse Tasks
and Resource Constraints
Paper
•
2308.15003
•
Published
•
1
Growing Efficient Deep Networks by Structured Continuous Sparsification
Paper
•
2007.15353
•
Published
•
1
Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Paper
•
2206.00277
•
Published
•
1
SiRA: Sparse Mixture of Low Rank Adaptation
Paper
•
2311.09179
•
Published
•
8
ComPEFT: Compression for Communicating Parameter Efficient Updates via
Sparsification and Quantization
Paper
•
2311.13171
•
Published
•
1
Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models
Paper
•
2004.12406
•
Published
•
1
Less is More: Selective Layer Finetuning with SubTuning
Paper
•
2302.06354
•
Published
•
1
Prune Once for All: Sparse Pre-Trained Language Models
Paper
•
2111.05754
•
Published
•
1
To prune, or not to prune: exploring the efficacy of pruning for model
compression
Paper
•
1710.01878
•
Published
•
1
Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training
Paper
•
2302.10798
•
Published
•
1
SortedNet, a Place for Every Network and Every Network in its Place:
Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper
•
2309.00255
•
Published
•
1
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
•
2309.08968
•
Published
•
22
LLM-Pruner: On the Structural Pruning of Large Language Models
Paper
•
2305.11627
•
Published
•
3
Towards Green AI in Fine-tuning Large Language Models via Adaptive
Backpropagation
Paper
•
2309.13192
•
Published
•
1
Feature Flow Regularization: Improving Structured Sparsity in Deep
Neural Networks
Paper
•
2106.02914
•
Published
•
1
Automatic Neural Network Pruning that Efficiently Preserves the Model
Accuracy
Paper
•
2111.09635
•
Published
•
1
I3D: Transformer architectures with input-dependent dynamic depth for
speech recognition
Paper
•
2303.07624
•
Published
•
1
An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect
Paper
•
2303.16212
•
Published
•
1
Distributed Pruning Towards Tiny Neural Networks in Federated Learning
Paper
•
2212.01977
•
Published
•
1
Neural Network Pruning as Spectrum Preserving Process
Paper
•
2307.08982
•
Published
•
1
Pruning a neural network using Bayesian inference
Paper
•
2308.02451
•
Published
•
1
Class-dependent Compression of Deep Neural Networks
Paper
•
1909.10364
•
Published
•
1
Structured Bayesian Compression for Deep Neural Networks Based on The
Turbo-VBI Approach
Paper
•
2302.10483
•
Published
•
1
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Paper
•
1909.12778
•
Published
•
1
Emergence of Segmentation with Minimalistic White-Box Transformers
Paper
•
2308.16271
•
Published
•
13
White-Box Transformers via Sparse Rate Reduction: Compression Is All
There Is?
Paper
•
2311.13110
•
Published
•
1
Sparse Probabilistic Circuits via Pruning and Growing
Paper
•
2211.12551
•
Published
•
2
Learning to Prune Deep Neural Networks via Reinforcement Learning
Paper
•
2007.04756
•
Published
•
1
Pruning Very Deep Neural Network Channels for Efficient Inference
Paper
•
2211.08339
•
Published
•
1
Fast Convex Pruning of Deep Neural Networks
Paper
•
1806.06457
•
Published
•
1
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning
Paper
•
1912.08881
•
Published
•
1
Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep
Neural Networks
Paper
•
2308.10438
•
Published
•
1
Advancing Model Pruning via Bi-level Optimization
Paper
•
2210.04092
•
Published
•
1
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of
Convolutional Neural Networks
Paper
•
2212.12770
•
Published
•
2
When Layers Play the Lottery, all Tickets Win at Initialization
Paper
•
2301.10835
•
Published
•
1
Lottery Tickets in Evolutionary Optimization: On Sparse
Backpropagation-Free Trainability
Paper
•
2306.00045
•
Published
•
1
Pruning at Initialization -- A Sketching Perspective
Paper
•
2305.17559
•
Published
•
1
The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training
Paper
•
2202.02643
•
Published
•
1
Why Random Pruning Is All We Need to Start Sparse
Paper
•
2210.02412
•
Published
•
1
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Paper
•
2110.15343
•
Published
•
1
Adaptive Activation-based Structured Pruning
Paper
•
2201.10520
•
Published
•
1
Neuron-based Pruning of Deep Neural Networks with Better Generalization
using Kronecker Factored Curvature Approximation
Paper
•
2111.08577
•
Published
•
1
AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks
Paper
•
2304.06941
•
Published
•
1
A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis,
and Recommendations
Paper
•
2308.06767
•
Published
•
1
Pruning Deep Neural Networks from a Sparsity Perspective
Paper
•
2302.05601
•
Published
•
1
White-Box Transformers via Sparse Rate Reduction
Paper
•
2306.01129
•
Published
•
1
SeReNe: Sensitivity based Regularization of Neurons for Structured
Sparsity in Neural Networks
Paper
•
2102.03773
•
Published
•
1
Pruning artificial neural networks: a way to find well-generalizing,
high-entropy sharp minima
Paper
•
2004.14765
•
Published
•
1
Regularization-based Pruning of Irrelevant Weights in Deep Neural
Architectures
Paper
•
2204.04977
•
Published
•
1
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental
Regularization
Paper
•
2309.06805
•
Published
•
1
Learning Activation Functions for Sparse Neural Networks
Paper
•
2305.10964
•
Published
•
1
LOss-Based SensiTivity rEgulaRization: towards deep sparse neural
networks
Paper
•
2011.09905
•
Published
•
1
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic
Speech Recognition
Paper
•
2209.15176
•
Published
•
1
Memory-efficient NLLB-200: Language-specific Expert Pruning of a
Massively Multilingual Machine Translation Model
Paper
•
2212.09811
•
Published
•
1
Sparse Low-rank Adaptation of Pre-trained Language Models
Paper
•
2311.11696
•
Published
•
1
Learning Pruned Structure and Weights Simultaneously from Scratch: an
Attention based Approach
Paper
•
2111.02399
•
Published
•
1
Pruning On-the-Fly: A Recoverable Pruning Method without Fine-tuning
Paper
•
2212.12651
•
Published
•
1
UPSCALE: Unconstrained Channel Pruning
Paper
•
2307.08771
•
Published
•
1
PruMUX: Augmenting Data Multiplexing with Model Compression
Paper
•
2305.14706
•
Published
•
1
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using
Training Dynamics
Paper
•
2305.18513
•
Published
•
2
Network Pruning via Transformable Architecture Search
Paper
•
1905.09717
•
Published
•
1
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
Model Inference with Unstructured Sparsity
Paper
•
2309.10285
•
Published
•
1
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence
Models for Improved Inference Efficiency
Paper
•
2304.02721
•
Published
•
3
Paper
•
2312.17244
•
Published
•
9
GMP*: Well-Tuned Gradual Magnitude Pruning Can Outperform Most
BERT-Pruning Methods
Paper
•
2210.06384
•
Published
•
1
The Truth is in There: Improving Reasoning in Language Models with
Layer-Selective Rank Reduction
Paper
•
2312.13558
•
Published
•
5
Sparsified Model Zoo Twins: Investigating Populations of Sparsified
Neural Network Models
Paper
•
2304.13718
•
Published
•
1
Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations
Paper
•
2205.13571
•
Published
•
1
Trained Rank Pruning for Efficient Deep Neural Networks
Paper
•
1812.02402
•
Published
•
1
TRP: Trained Rank Pruning for Efficient Deep Neural Networks
Paper
•
2004.14566
•
Published
•
1
Plug-in, Trainable Gate for Streamlining Arbitrary Neural Networks
Paper
•
1904.10921
•
Published
•
1
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
Inference
Paper
•
2304.04947
•
Published
•
1
Training Neural Networks with Fixed Sparse Masks
Paper
•
2111.09839
•
Published
•
1
A Neural Scaling Law from Lottery Ticket Ensembling
Paper
•
2310.02258
•
Published
•
1
Methods for Pruning Deep Neural Networks
Paper
•
2011.00241
•
Published
•
1
On the Existence of Universal Lottery Tickets
Paper
•
2111.11146
•
Published
•
1
Quantifying lottery tickets under label noise: accuracy, calibration,
and complexity
Paper
•
2306.12190
•
Published
•
1
Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix
Sketching
Paper
•
2305.18789
•
Published
•
1
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
Paper
•
2106.10404
•
Published
•
1
Lottery Jackpots Exist in Pre-trained Models
Paper
•
2104.08700
•
Published
•
1
Grokking Tickets: Lottery Tickets Accelerate Grokking
Paper
•
2310.19470
•
Published
•
1
SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative
Magnitude Pruning
Paper
•
2305.14852
•
Published
•
1
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Paper
•
2306.16788
•
Published
•
1
"Understanding Robustness Lottery": A Geometric Visual Comparative
Analysis of Neural Network Pruning Approaches
Paper
•
2206.07918
•
Published
•
1
Randomly Initialized Subnetworks with Iterative Weight Recycling
Paper
•
2303.15953
•
Published
•
1
DASS: Differentiable Architecture Search for Sparse neural networks
Paper
•
2207.06968
•
Published
•
1
Ada-QPacknet -- adaptive pruning with bit width reduction as an
efficient continual learning method without forgetting
Paper
•
2308.07939
•
Published
•
1
Robust Tickets Can Transfer Better: Drawing More Transferable
Subnetworks in Transfer Learning
Paper
•
2304.11834
•
Published
•
1
AP: Selective Activation for De-sparsifying Pruned Neural Networks
Paper
•
2212.06145
•
Published
•
1
HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign
Supermask
Paper
•
2206.04385
•
Published
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey
Paper
•
2205.08099
•
Published
•
1
Structured Pruning is All You Need for Pruning CNNs at Initialization
Paper
•
2203.02549
•
Published
In deep reinforcement learning, a pruned network is a good network
Paper
•
2402.12479
•
Published
•
18
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient
Sparsity Allocation
Paper
•
2402.16880
•
Published
•
2
Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large
Language Models
Paper
•
2405.01943
•
Published
Pruning as a Domain-specific LLM Extractor
Paper
•
2405.06275
•
Published
•
1
Structural Pruning of Pre-trained Language Models via Neural
Architecture Search
Paper
•
2405.02267
•
Published
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Paper
•
2407.00928
•
Published
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer
Merging
Paper
•
2406.16330
•
Published
BlockPruner: Fine-grained Pruning for Large Language Models
Paper
•
2406.10594
•
Published
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large
Language Models
Paper
•
2405.16057
•
Published
Pruning Large Language Models with Semi-Structural Adaptive Sparse
Training
Paper
•
2407.20584
•
Published
Greedy Output Approximation: Towards Efficient Structured Pruning for
LLMs Without Retraining
Paper
•
2407.19126
•
Published