Skip to content

Releases: predibase/lorax

v0.12.1

25 Nov 21:15
c0e5798
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

🔧 Maintenance

Full Changelog: v0.12.0...v0.12.1

v0.12.0: Multi-LoRA prefix caching, fp8 kv cache, Mllama, function calling

06 Nov 21:21
e03f989
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

📝 Docs

  • added metrics docs, updated links in main docs by @noyoshi in #663

🔧 Maintenance

New Contributors

Full Changelog: v0.11.0...v0.12.0

v0.11.0: Prefix caching, VLMs, BERT (embed, NER), FP8

18 Sep 21:53
66c5b9c
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Full Changelog: v0.10.0...v0.11.0

v0.10.0: Speculative decoding adapters and SGMV + BGMV

23 May 16:55
bd7db80
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Full Changelog: v0.9.0...v0.10.0

v0.9.0

23 Mar 00:10
8ff0bf5
Compare
Choose a tag to compare

🎉 Enhancements

  • Allow assigning dedicated memory reservation for adapters on GPU by @tgaddair in #303
  • Enforce adapters cannot be loaded past --adapter-memory-fraction by @tgaddair in #306
  • Added Qwen2 by @tgaddair in #327
  • Make max_new_tokens optional, default to max_total_tokens - input_length by @tgaddair in #353
  • Expose ignore_eos_token option in generate requests by @jeffreyftang in #340
  • Generate to max_total_tokens during warmup by @tgaddair in #286
  • Add support for returning alternative tokens by @JTS22 in #297
  • feat: add repetition_penalty and top_k to openai by @huytuong010101 in #288
  • Add support for LoRA adapters trained with Rank-Stabilized scaling by @arnavgarg1 in #299
  • Provide more granular methods to configure the embedded S3 client. by @mitchklusty in #325
  • Allow specifying base model as model param in OpenAI API by @tgaddair in #331
  • Add ignore_eos_token param to completions and chat completions endpoints by @jeffreyftang in #344
  • Log whether SGMV kernel is enabled by @tgaddair in #342
  • Log generated tokens out to file when streaming by @magdyksaleh in #309

🐛 Bugfixes

  • Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting by @tgaddair in #324
  • Fix hanging caused by tqdm stderr not being printed by @tgaddair in #352
  • Fix dynamic RoPE by @tgaddair in #350
  • Only update cache during warmup by @tgaddair in #351
  • Prevent model loading errors from appearing as flash attention import errors by @tgaddair in #328
  • Make architecture compatibility check non-fatal if base model config cannot be loaded by @tgaddair in #317
  • Fix Qwen2 LoRA loading by @tgaddair in #345
  • Remove vec wrapping from OpenAI-compatible response by @jeffreyftang in #273
  • Disallow early stopping during warmup by @tgaddair in #290
  • Skip returning EOS token on finish_reason 'stop' by @jeffreyftang in #289
  • Fixed static adapter loading with same arch by @tgaddair in #300
  • Ensure model_id is a string when using a model from s3 by @fadebek in #291
  • Fix name for adapter id by @noyoshi in #284
  • Update AsyncClient with ignore_eos_token parameter by @jeffreyftang in #341

📝 Docs

🔧 Maintenance

  • Split out server and router unit tests by @tgaddair in #275
  • Add in response headers to streaming endpoint by @noyoshi in #282
  • Propagate bearer token from header if one exists for OpenAI-compatible endpoints by @jeffreyftang in #278
  • Update tokenizers to v0.15 to be consistent with server by @tgaddair in #285
  • Autogen python client docs by @tgaddair in #295
  • Reporting on total tokens by @noyoshi in #349

New Contributors

Full Changelog: v0.8.1...v0.9.0

v0.8.1: Gemma support

21 Feb 22:28
a3b865d
Compare
Choose a tag to compare

🎉 Enhancements

🔧 Maintenance

Full Changelog: v0.8.0...v0.8.1

v0.8: Structured Output via Outlines

20 Feb 23:47
dd68924
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7: LoRA Merging (linear, TIES, DARE) per request

01 Feb 22:08
56dc6e2
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6: OpenAI compatible API

10 Jan 19:38
64739ad
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

  • fix: Handle NaN values during weight conversion by @tgaddair in #168

📝 Docs

🔧 Maintenance

New Contributors

Full Changelog: v0.5.0...v0.6.0

v0.5: CUDA graph compilation

08 Jan 17:14
57d5470
Compare
Choose a tag to compare

🎉 Enhancements

🐛 Bugfixes

  • Fixed deadlock in sgmv_shrink kernel caused by imbalanced segments by @tgaddair in #156
  • Fixed loading adapter from absolute s3 path by @tgaddair in #161

📝 Docs

  • Update client docs with new endpoint source by @abidwael in #126
  • Update client docs with new endpoint source by @abidwael in #146

🔧 Maintenance

New Contributors

Full Changelog: v0.4.1...v0.5.0