SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Nov 27, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
row-major matmul optimization
Advanced Quantization Algorithm for LLMs/VLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.
Add a description, image, and links to the int4 topic page so that developers can more easily learn about it.
To associate your repository with the int4 topic, visit your repo's landing page and select "manage topics."