We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 66.2k 12.2k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.5k 336
Common recipes to run vLLM
Jupyter Notebook 305 112
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python 175 22
Intelligent Router for Mixture-of-Models
Go 2.6k 368
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Community maintained hardware plugin for vLLM on Ascend
Community maintained hardware plugin for vLLM on Spyre
A framework for efficient model inference with omni-modality models
A high-performance and light-weight router for vLLM large scale deployment
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
TPU inference for vLLM, with unified JAX and PyTorch support.
vLLM Daily Summarization of Merged PRs