SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Dec 5, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
A model compression and acceleration toolbox based on pytorch.
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
Notes on quantization in neural networks
[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
Post-training static quantization using ResNet18 architecture
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization
Improved the performance of 8-bit PTQ4DM expecially on FID.
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
Post-training quantization on Nvidia Nemo ASR model
An example to quantize MobileNetV2 trained on CIFAR-10 dataset with PyTorch FX graph mode quantization
Add a description, image, and links to the post-training-quantization topic page so that developers can more easily learn about it.
To associate your repository with the post-training-quantization topic, visit your repo's landing page and select "manage topics."