Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows by JamePeng · Pull Request #1966 · abetlen/llama-cpp-python

JamePeng · 2025-03-09T01:57:40Z

Update llama.cpp version llama.cpp updated [from 794fe2 to f08f4b3]
Use the llama_sampler_init instead of llama_sampler() for safe usage
Sync llama : add Phi-4-mini support
Sync llama : expose llama_model_n_head_kv in the API
Sync tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
Remove Tail-Free sampling
Add TopN-Sigma/XTC/DRY samplers code into sampler
Sync llama : Add Gemma 3 support

JamePeng · 2025-03-09T20:51:59Z

I tried to adjust the workflow output based on VS2022 to compile pip wheels, and generate two cuda versions 12.4.1 and 12.6.3 and the win version of py310-312 for your convenience.
It should have been compiled now: https://github.com/JamePeng/llama-cpp-python/releases

JamePeng · 2025-03-13T13:16:24Z

llama.cpp : refactor llama_context, llama_kv_cache, llm_build_context (ggml-org/llama.cpp#12181)
They change API name again, :<

JamePeng · 2025-03-13T14:18:07Z

The adjusted code is moved to https://github.com/JamePeng/llama-cpp-python/tree/1966-branch

JamePeng changed the title ~~Sync LLAMA_API names with ggml-org/llama.cpp 20250309, support LLAMA_VOCAB_PRE_TYPE_GPT4O~~ Sync LLAMA_API names with ggml-org/llama.cpp 20250309 Mar 9, 2025

JamePeng mentioned this pull request Mar 9, 2025

GPU Support Missing in Version >=0.3.5 on Windows with CUDA 12.4 and RTX 3090 #1967

Open

JamePeng force-pushed the main branch from d2dd3b0 to 7074f42 Compare March 9, 2025 16:50

JamePeng changed the title ~~Sync LLAMA_API names with ggml-org/llama.cpp 20250309~~ Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows Mar 9, 2025

JamePeng closed this Mar 13, 2025

JamePeng force-pushed the main branch from 00bcbcc to 37eb5f0 Compare March 13, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows#1966

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows#1966
JamePeng wants to merge 0 commit intoabetlen:mainfrom
JamePeng:main

JamePeng commented Mar 9, 2025 •

edited

Loading

Uh oh!

JamePeng commented Mar 9, 2025

Uh oh!

JamePeng commented Mar 13, 2025 •

edited

Loading

Uh oh!

JamePeng commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JamePeng commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamePeng commented Mar 9, 2025

Uh oh!

JamePeng commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamePeng commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JamePeng commented Mar 9, 2025 •

edited

Loading

JamePeng commented Mar 13, 2025 •

edited

Loading