✨ Gemma 4 is here! ✨

Read about the release in the blog post

3.19.0 (2026-06-30)

Features

Gemma 4 support (#591) (5fe6e27) (documentation: Gemma 4)
riscv64 prebuilt binaries (#615) (e8336a4)
automatically enable flash attention when optimal
improve inference performance when a grammar is active
more precise resource usage estimation
resource usage capping (documentation: Resource Capping)
automatically enable or disable mmap depending on the environment
support Q1_0 quant
improve stability on unified memory systems
disable residency sets on macOS by default for better OS responsiveness
default progressLogs to "stderr" to avoid polluting stdout with logs
optimized prebuilt binaries for arm architectures

Bug Fixes

MXFP4_MOE quant name
Vulkan backend successful load detection even when no devices are available
CLI: avoid redownloading existing models that consist of multiple parts from a URI
optimize checkpoints management when using grammar
improve stability when loading huge models
reranking result range for Qwen 3 reranker
adapt to breaking llama.cpp changes

Shipped with llama.cpp release b9842

3.18.1 (2026-03-17)

Features

customize postinstall behavior (#582) (57bea3d) (documentation: Customizing postinstall Behavior)
experimental support for context KV cache type configurations (#582) (57bea3d) (documentation: LlamaContextOptions["experimentalKvCacheKeyType"])
support NVFP4 quants (#582) (57bea3d)

Shipped with llama.cpp release b8390

3.18.0 (2026-03-15)

Features

automatic checkpoints for models that need it (#573) (c641959)
QwenChatWrapper: Qwen 3.5 support (#573) (c641959)
inspect gpu command: detect and report missing prebuilt binary modules and custom npm registry (#573) (c641959)

Bug Fixes

resolveModelFile: deduplicate concurrent downloads (#570) (cc105b9)
correct Vulkan URL casing in documentation links (#568) (5a44506)
Qwen 3.5 memory estimation (#573) (c641959)
grammar use with HarmonyChatWrapper (#573) (c641959)
add mistral think segment detection (#573) (c641959)
compress excessively long segments from the current response on context shift instead of throwing an error (#573) (c641959)
default thinking budget to 75% of the context size to prevent low-quality responses (#573) (c641959)

Shipped with llama.cpp release b8352

3.17.1 (2026-02-28)

Bug Fixes

Electron template (#566) (8931402)

Shipped with llama.cpp release b8179

3.17.0 (2026-02-27)

Features

getLlama: build: "autoAttempt" (#564) (dda5ade) (documentation: LlamaOptions ["build"])
remove octokit dependency (#564) (dda5ade)

Bug Fixes

CLI: disable Direct I/O by default (#564) (dda5ade)
Bun segmentation fault on process exit with undisposed Llama instance (#564) (dda5ade)
detect glibc inside Nix (#564) (dda5ade)

Shipped with llama.cpp release b8169

3.16.2 (2026-02-21)

Bug Fixes

macOS 14 prebuilt binaries (#559) (6faa5ae)

Shipped with llama.cpp release b8121

3.16.1 (2026-02-20)

Bug Fixes

export missing types (#557) (498711c)

Shipped with llama.cpp release b8117

3.16.0 (2026-02-19)

Features

Exclude Top Choices (XTC) (#553) (57e8c22) (documentation: LLamaChatPromptOptions["xtc"])
DRY (Don't Repeat Yourself) repeat penalty (#553) (57e8c22) (documentation: LLamaChatPromptOptions["dryRepeatPenalty"])
Tiny Aya support (#553) (57e8c22)

Bug Fixes

adjust the default VRAM padding config to reserve enough memory for compute buffers (#553) (57e8c22)
support function call syntax with optional whitespace prefix (#553) (57e8c22)
change the default value of useDirectIo to false (#553) (57e8c22)
Vulkan device dedupe (#553) (57e8c22)

Shipped with llama.cpp release b8095

3.15.1 (2026-01-26)

Bug Fixes

adapt to llama.cpp changes (#547) (4baa480)
duplicate backend library files (#541) (f5123bf)

Shipped with llama.cpp release b7836

3.15.0 (2026-01-10)

Features

LlamaCompletion: stopOnAbortSignal (#538) (734693d) (documentation: LlamaCompletionGenerationOptions["stopOnAbortSignal"])
LlamaModel: useDirectIo (#538) (734693d) (documentation: LlamaModelOptions["useDirectIo"])

Bug Fixes

support new CUDA 13.1 archs (#538) (734693d)
build the prebuilt binaries with CUDA 13.1 instead of 13.0 (#538) (734693d)

Shipped with llama.cpp release b7698

Uh oh!

Uh oh!

Releases: withcatai/node-llama-cpp

Release list

v3.19.0

✨ Gemma 4 is here! ✨

3.19.0 (2026-06-30)

Features

Bug Fixes

Uh oh!

v3.18.1

3.18.1 (2026-03-17)

Features

Uh oh!

v3.18.0

3.18.0 (2026-03-15)

Features

Bug Fixes

Uh oh!

v3.17.1

3.17.1 (2026-02-28)

Bug Fixes

Uh oh!

v3.17.0

3.17.0 (2026-02-27)

Features

Bug Fixes

Uh oh!

v3.16.2

3.16.2 (2026-02-21)

Bug Fixes

Uh oh!

v3.16.1

3.16.1 (2026-02-20)

Bug Fixes

Uh oh!

v3.16.0

3.16.0 (2026-02-19)

Features

Bug Fixes

Uh oh!

v3.15.1

3.15.1 (2026-01-26)

Bug Fixes

Uh oh!

v3.15.0

3.15.0 (2026-01-10)

Features

Bug Fixes

Uh oh!