SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Dec 5, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
Neural Network Compression Framework for enhanced OpenVINO™ inference
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
YOLO ModelCompression MultidatasetTraining
A model compression and acceleration toolbox based on pytorch.
Tutorial notebooks for hls4ml
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
FrostNet: Towards Quantization-Aware Network Architecture Search
OpenVINO Training Extensions Object Detection
Quantization Aware Training
Notes on quantization in neural networks
Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules
Quantization-aware training with spiking neural networks
3rd place solution for NeurIPS 2019 MicroNet challenge
FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
QT-DOG: QUANTIZATION-AWARE TRAINING FOR DOMAIN GENERALIZATION
Add a description, image, and links to the quantization-aware-training topic page so that developers can more easily learn about it.
To associate your repository with the quantization-aware-training topic, visit your repo's landing page and select "manage topics."