SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Nov 27, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
PyTorch native quantization and sparsity for training and inference
PaddleSlim is an open-source library for deep model compression and architecture search.
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Neural Network Compression Framework for enhanced OpenVINO™ inference
Network Slimming (Pytorch) (ICCV 2017)
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
More readable and flexible yolov5 with more backbone(gcn, resnet, shufflenet, moblienet, efficientnet, hrnet, swin-transformer, etc) and (cbam,dcn and so on), and tensorrt
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Caffe for Sparse and Low-rank Deep Neural Networks
Reference ImageNet implementation of SelecSLS CNN architecture proposed in the SIGGRAPH 2020 paper "XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera". The repository also includes code for pruning the model based on implicit sparsity emerging from adaptive gradient descent methods, as detailed in the CVPR 2019 paper "On i…
Sparse Optimisation Research Code
Always sparse. Never dense. But never say never. A Sparse Training repository for the Adaptive Sparse Connectivity concept and its algorithmic instantiation, i.e. Sparse Evolutionary Training, to boost Deep Learning scalability on various aspects (e.g. memory and computational time efficiency, representation and generalization power).
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
Sparse and structured neural attention mechanisms
Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626
A research library for pytorch-based neural network pruning, compression, and more.
Add a description, image, and links to the sparsity topic page so that developers can more easily learn about it.
To associate your repository with the sparsity topic, visit your repo's landing page and select "manage topics."