Skip to content

Releases: withcatai/node-llama-cpp

v3.19.0

Choose a tag to compare

@github-actions github-actions released this 30 Jun 01:30
f53ea51

Gemma 4 is here!

Read about the release in the blog post


3.19.0 (2026-06-30)

Features

  • Gemma 4 support (#591) (5fe6e27) (documentation: Gemma 4)
  • riscv64 prebuilt binaries (#615) (e8336a4)
  • automatically enable flash attention when optimal
  • improve inference performance when a grammar is active
  • more precise resource usage estimation
  • resource usage capping (documentation: Resource Capping)
  • automatically enable or disable mmap depending on the environment
  • support Q1_0 quant
  • improve stability on unified memory systems
  • disable residency sets on macOS by default for better OS responsiveness
  • default progressLogs to "stderr" to avoid polluting stdout with logs
  • optimized prebuilt binaries for arm architectures

Bug Fixes

  • MXFP4_MOE quant name
  • Vulkan backend successful load detection even when no devices are available
  • CLI: avoid redownloading existing models that consist of multiple parts from a URI
  • optimize checkpoints management when using grammar
  • improve stability when loading huge models
  • reranking result range for Qwen 3 reranker
  • adapt to breaking llama.cpp changes

Shipped with llama.cpp release b9842

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.18.1

Choose a tag to compare

@github-actions github-actions released this 17 Mar 08:38
57bea3d

3.18.1 (2026-03-17)

Features


Shipped with llama.cpp release b8390

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.18.0

Choose a tag to compare

@github-actions github-actions released this 15 Mar 21:18
c641959

3.18.0 (2026-03-15)

Features

  • automatic checkpoints for models that need it (#573) (c641959)
  • QwenChatWrapper: Qwen 3.5 support (#573) (c641959)
  • inspect gpu command: detect and report missing prebuilt binary modules and custom npm registry (#573) (c641959)

Bug Fixes

  • resolveModelFile: deduplicate concurrent downloads (#570) (cc105b9)
  • correct Vulkan URL casing in documentation links (#568) (5a44506)
  • Qwen 3.5 memory estimation (#573) (c641959)
  • grammar use with HarmonyChatWrapper (#573) (c641959)
  • add mistral think segment detection (#573) (c641959)
  • compress excessively long segments from the current response on context shift instead of throwing an error (#573) (c641959)
  • default thinking budget to 75% of the context size to prevent low-quality responses (#573) (c641959)

Shipped with llama.cpp release b8352

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.17.1

Choose a tag to compare

@github-actions github-actions released this 28 Feb 01:51
8931402

3.17.1 (2026-02-28)

Bug Fixes


Shipped with llama.cpp release b8179

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.17.0

Choose a tag to compare

@github-actions github-actions released this 27 Feb 22:38
dda5ade

3.17.0 (2026-02-27)

Features

Bug Fixes

  • CLI: disable Direct I/O by default (#564) (dda5ade)
  • Bun segmentation fault on process exit with undisposed Llama instance (#564) (dda5ade)
  • detect glibc inside Nix (#564) (dda5ade)

Shipped with llama.cpp release b8169

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.16.2

Choose a tag to compare

@github-actions github-actions released this 21 Feb 20:33
6faa5ae

3.16.2 (2026-02-21)

Bug Fixes


Shipped with llama.cpp release b8121

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.16.1

Choose a tag to compare

@github-actions github-actions released this 20 Feb 21:39
498711c

3.16.1 (2026-02-20)

Bug Fixes


Shipped with llama.cpp release b8117

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.16.0

Choose a tag to compare

@github-actions github-actions released this 19 Feb 04:08
57e8c22

3.16.0 (2026-02-19)

Features

Bug Fixes

  • adjust the default VRAM padding config to reserve enough memory for compute buffers (#553) (57e8c22)
  • support function call syntax with optional whitespace prefix (#553) (57e8c22)
  • change the default value of useDirectIo to false (#553) (57e8c22)
  • Vulkan device dedupe (#553) (57e8c22)

Shipped with llama.cpp release b8095

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.15.1

Choose a tag to compare

@github-actions github-actions released this 26 Jan 03:06
4baa480

3.15.1 (2026-01-26)

Bug Fixes


Shipped with llama.cpp release b7836

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.15.0

Choose a tag to compare

@github-actions github-actions released this 10 Jan 22:40
734693d

3.15.0 (2026-01-10)

Features

Bug Fixes

  • support new CUDA 13.1 archs (#538) (734693d)
  • build the prebuilt binaries with CUDA 13.1 instead of 13.0 (#538) (734693d)

Shipped with llama.cpp release b7698

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)