Welcome to vLLM¶ Easy, fast, and cheap LLM serving for everyone Star Watch Fork vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. Where to get started with vLLM depends on the type of user. If you are looking to: Run


{{#tags}}- {{label}}
{{/tags}}