vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.7k
Star 30.7k

Code
Issues 1.8k
Pull requests 398
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 56 Milestones 0

New pull request New

398 Open 4,431 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture

#10608 opened Nov 24, 2024 by IdoAsraff

Loading…

[Bug]: Authorization ignored when root_path is set frontend

#10606 opened Nov 24, 2024 by chaunceyjiang

Loading…

[fix] Correct num_accepted_tokens counting

#10604 opened Nov 24, 2024 by KexinFeng

Loading…

[Misc]Further reduce BNB static variable needs-rebase

#10597 opened Nov 24, 2024 by jeejeelee • Draft

2 tasks

[Interleaved ATTN] Support for Mistral-8B

#10591 opened Nov 23, 2024 by patrickvonplaten

Loading…

[Kernel] Remove hard-dependencies of Speculative decode to CUDA workers

#10587 opened Nov 23, 2024 by xuechendi

Loading…

【Kernel】Tuning fused moe for qwen2-57b in GTX 4090 (tp4pp2)

#10586 opened Nov 23, 2024 by BBuf

Loading…

fix json serialization issue frontend

#10580 opened Nov 22, 2024 by maxdebayser

Loading…

[WIP] V1 LoRA support needs-rebase

#10579 opened Nov 22, 2024 by varun-sundar-rabindranath • Draft

[Core] Update to outlines > 0.1.4 ci/build

#10576 opened Nov 22, 2024 by russellb • Draft

[ Kernels ] [ AMD ] Add Fused MoE Configs

#10574 opened Nov 22, 2024 by robertgshaw2-neuralmagic • Draft

[V1] Refactor model executable interface for multimodal models

#10570 opened Nov 22, 2024 by ywang96 • Draft

14 tasks done

[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU)

#10565 opened Nov 22, 2024 by SanjuCSudhakaran

Loading…

[Model] Added GLM-4 series hf format model support vllm==0.6.4

#10561 opened Nov 22, 2024 by sixsixcoder

Loading…

[Benchmark] Benchmark structured output with datasets

#10557 opened Nov 22, 2024 by xuechendi

Loading…

[Docs] Add dedicated tool calling page to docs documentation

Improvements or additions to documentation

#10554 opened Nov 21, 2024 by mgoin

Loading…

[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server frontend

#10546 opened Nov 21, 2024 by angkywilliam

Loading…

[Distributed] Tensor Parallel RMSNorm

#10542 opened Nov 21, 2024 by tlrmchlsmth • Draft

Add Sageattention backend

#10532 opened Nov 21, 2024 by flozi00

Loading…

[Model]: Add support for Aria model documentation

Improvements or additions to documentation

#10514 opened Nov 21, 2024 by xffxff

Loading…

[core] overhaul memory profiling and fix backward compatibility needs-rebase

#10511 opened Nov 21, 2024 by youkaichao

Loading…

Turn on V1 for H200 build ci/build perf-benchmarks

#10505 opened Nov 21, 2024 by simon-mo

Loading…

[Model] Add OLMo November 2024 model documentation

Improvements or additions to documentation

#10503 opened Nov 20, 2024 by 2015aroras

Loading…

[Core] Implement disagg prefill by StatelessProcessGroup ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#10502 opened Nov 20, 2024 by KuntaiDu

Loading…

Support softcap in ROCm Flash Attention

#10500 opened Nov 20, 2024 by hliuca

Loading…

Previous 1 2 3 4 5 … 15 16 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-10-24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly