Skip to content

Releases: algorithmicsuperintelligence/optillm

v0.3.20

Choose a tag to compare

@codelion codelion released this 05 Jul 04:02
205a037

What's Changed

Fix: runaway local generation for models whose ChatML end token differs from their tokenizer EOS (#317)

Running such a model through optillm's built-in local inference never stopped early and generated the full max_new_tokens default (4096) on every call that omitted max_tokens — e.g. dhara-250m's chat ends at <|im_end|> (49154) but its tokenizer eos_token is <|end_of_text|> (1), so it rambled ~800s/call.

  • Honor generation_config.eos_token_id: generation now resolves EOS from the model's own generation config (merging the tokenizer EOS as a fallback) instead of hardcoding tokenizer.eos_token_id. Fixes any chat model with this mismatch; well-behaved models (e.g. Qwen2.5, whose tokenizer EOS already is <|im_end|>) are unchanged.
  • OPTILLM_MAX_TOKENS: new env var to override the default max_new_tokens (default stays 4096), bounding any request that sends no max_tokens. An explicit request max_tokens still takes precedence. Useful for small models that don't reliably emit an EOS token.

Only the local inference engine (OPTILLM_API_KEY=optillm) is affected; proxied providers handle stopping themselves. Includes unit tests for both helpers.

Full Changelog: v0.3.19...v0.3.20

v0.3.19

Choose a tag to compare

@codelion codelion released this 05 Jul 02:19
6a0200c

What's Changed

Fix: import optillm crash on macOS with transformers 5.13 (#316)

A fresh install on Apple silicon crashed with AttributeError: 'str' object has no attribute '__module__' at import optillm.

  • Root cause: transformers 5.13.0 tightened AutoTokenizer.register() to require a class, but mlx-lm 0.31.3 registers a custom tokenizer by string name (tokenizer_utils.py: AutoTokenizer.register("NewlineTokenizer", ...)). transformers ≤5.12.1 tolerated it.
  • macOS-only: optillm installs mlx-lm only on Apple silicon (platform_machine=="arm64" and sys_platform=="darwin"); Linux CI never installs it, so CI was unaffected.
  • Fix: pin transformers>=5.0.0,<5.13.0 (resolves to 5.12.1). The transformers 5.13.0 HF path itself is fine (dhara loads and generates); the incompatibility is purely mlx-lm's. Lift the cap once mlx-lm ships a transformers-5.13-compatible release.

Full Changelog: v0.3.18...v0.3.19

v0.3.18

Choose a tag to compare

@codelion codelion released this 05 Jul 01:49
beca974

What's Changed

Opt-in file persistence for the memory plugin (#111, #315)

The memory plugin can now persist extracted memories across requests. Set OPTILLM_MEMORY_FILE to a path to opt in; when unset, behaviour is byte-for-byte unchanged (in-RAM, reset per request).

  • Loads any previously saved items on init and writes after every add().
  • Atomic writes (temp file + os.replace) so a crash mid-write can't corrupt the store.
  • A missing/corrupt/non-list file degrades gracefully to an empty store — persistence can never fail a request. Loaded items are filtered to strings and bounded by max_size.

CI: Frame SAST scan of PR-changed files

Added a Security Scan (Frame SAST) workflow that runs the Frame neuro-symbolic SAST tool (pinned) on the Python files changed by each pull request. Scanning only changed files surfaces newly introduced issues without failing on pre-existing findings; the job fails only on high/critical severity.

Full Changelog: v0.3.17...v0.3.18

v0.3.17

Choose a tag to compare

@codelion codelion released this 04 Jul 16:43
6901e2d

What's Changed

Fix: MCTS parameters are now request-scoped (#304, #314)

proxy() previously wrote each request's mcts_depth / mcts_exploration / mcts_simulations into the module-level server_config, and execute_single_approach() read them back out of that same global. Under any threaded WSGI server (Flask's default threaded mode, gunicorn, uWSGI, Hypercorn), two concurrent requests raced on that global — request B could overwrite request A's params between A's write and A's read, silently running A's MCTS with B's settings. The write also leaked one request's params into every later request.

MCTS params now flow through the per-request request_config (with the server/CLI defaults as fallback), consistent with every other approach parameter. Behaviour is unchanged for all callers; the shared global is no longer mutated per request, so there is nothing left to race on.

Includes regression tests covering both the per-request read path and the proxy no-mutation guarantee.

Full Changelog: v0.3.16...v0.3.17

v0.3.16

Choose a tag to compare

@codelion codelion released this 03 Jul 11:26
159d340

What's Changed

  • Guard against empty/None/truncated provider responses in rto, self_consistency, reread, leap by @SuperMarioYL in #311

New Contributors

Full Changelog: v0.3.15...v0.3.16

v0.3.15

Choose a tag to compare

@codelion codelion released this 07 May 08:48
df018d6

What's Changed

New Contributors

Full Changelog: v0.3.14...v0.3.15

v0.3.14

Choose a tag to compare

@codelion codelion released this 19 Mar 00:18
63478f4

What's Changed

Full Changelog: v0.3.13...v0.3.14

v0.3.13

Choose a tag to compare

@codelion codelion released this 28 Jan 03:24
9cddd15

What's Changed

New Contributors

Full Changelog: v0.3.12...v0.3.13

v0.3.12

Choose a tag to compare

@codelion codelion released this 25 Dec 04:42
5211f50

What's Changed

Full Changelog: v0.3.11...v0.3.12

v0.3.11

Choose a tag to compare

@codelion codelion released this 03 Dec 03:49
93e1b52

What's Changed

Full Changelog: v0.3.10...v0.3.11