#

quantization

Here are 661 public repositories matching this topic...

intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

rounding quantization awq int4 gptq neural-compressor

Updated Nov 13, 2024
Python

sony / model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

machine-learning deep-neural-networks deep-learning neural-network tensorflow optimizer pytorch quantization qat network-quantization network-compression edge-ai ptq

Updated Nov 13, 2024
Python

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Updated Nov 13, 2024
Python

huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated Nov 13, 2024
Jupyter Notebook

pytorch / ao

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

Updated Nov 13, 2024
Python

openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

machine-learning computer-vision deep-learning pytorch semi-supervised-learning image-classification object-detection transfer-learning image-segmentation quantization action-recognition automl incremental-learning anomaly-detection hyper-parameter-optimization self-supervised-learning openvino neural-networks-compression datumaro

Updated Nov 13, 2024
Python

datawhalechina / awesome-compression

模型压缩的小白入门教程

compression quantization knowledge-distillation model-pruning prune model-compression neural-architecture-search kd model-quantization tinyml

Updated Nov 13, 2024

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Nov 12, 2024
Python

huggingface / optimum-quanto

A pytorch quantization backend for optimum

pytorch quantization optimum

Updated Nov 12, 2024
Python

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization

Updated Nov 13, 2024
Python

evanpacini / dither

Simple dithering library in Rust, based on image-rs

rust image-processing dithering dithering-algorithms quantization dither quantisation

Updated Nov 12, 2024
Rust

openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert hawq onnx openvino mmdetection mixed-precision-training quantization-aware-training

Updated Nov 12, 2024
Python

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated Nov 12, 2024
Python

ikergarcia1996 / Easy-Translate

Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.

cpu translation gpu machine-translation prompt transformers pytorch easy-to-use easy quantization begginers 8-bit 4-bit huggingface-transformers hugginface llm m2m100 hugginface-hub nllb200

Updated Nov 12, 2024
Python

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention quantization inference-acceleration llm

Updated Nov 12, 2024
Python

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated Nov 12, 2024
Python

SkywardAI / shibuya

A project built Electron + React.js, to dig out the potential of cross platform AI completion.

electron javascript ai chatbot quantization llm small-language-models

Updated Nov 12, 2024
JavaScript

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Nov 12, 2024
Python

mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

machine-learning quantization llm

Updated Nov 11, 2024
Python

mit-han-lab / nunchaku

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

flux lora quantization diffusion-models mlsys genai

Updated Nov 11, 2024
Cuda

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."