Releases: algorithmicsuperintelligence/optillm
Release list
v0.3.20
What's Changed
Fix: runaway local generation for models whose ChatML end token differs from their tokenizer EOS (#317)
Running such a model through optillm's built-in local inference never stopped early and generated the full max_new_tokens default (4096) on every call that omitted max_tokens — e.g. dhara-250m's chat ends at <|im_end|> (49154) but its tokenizer eos_token is <|end_of_text|> (1), so it rambled ~800s/call.
- Honor
generation_config.eos_token_id: generation now resolves EOS from the model's own generation config (merging the tokenizer EOS as a fallback) instead of hardcodingtokenizer.eos_token_id. Fixes any chat model with this mismatch; well-behaved models (e.g. Qwen2.5, whose tokenizer EOS already is<|im_end|>) are unchanged. OPTILLM_MAX_TOKENS: new env var to override the defaultmax_new_tokens(default stays 4096), bounding any request that sends nomax_tokens. An explicit requestmax_tokensstill takes precedence. Useful for small models that don't reliably emit an EOS token.
Only the local inference engine (OPTILLM_API_KEY=optillm) is affected; proxied providers handle stopping themselves. Includes unit tests for both helpers.
Full Changelog: v0.3.19...v0.3.20
v0.3.19
What's Changed
Fix: import optillm crash on macOS with transformers 5.13 (#316)
A fresh install on Apple silicon crashed with AttributeError: 'str' object has no attribute '__module__' at import optillm.
- Root cause: transformers 5.13.0 tightened
AutoTokenizer.register()to require a class, butmlx-lm0.31.3 registers a custom tokenizer by string name (tokenizer_utils.py: AutoTokenizer.register("NewlineTokenizer", ...)). transformers ≤5.12.1 tolerated it. - macOS-only: optillm installs
mlx-lmonly on Apple silicon (platform_machine=="arm64" and sys_platform=="darwin"); Linux CI never installs it, so CI was unaffected. - Fix: pin
transformers>=5.0.0,<5.13.0(resolves to 5.12.1). The transformers 5.13.0 HF path itself is fine (dhara loads and generates); the incompatibility is purely mlx-lm's. Lift the cap once mlx-lm ships a transformers-5.13-compatible release.
Full Changelog: v0.3.18...v0.3.19
v0.3.18
What's Changed
Opt-in file persistence for the memory plugin (#111, #315)
The memory plugin can now persist extracted memories across requests. Set OPTILLM_MEMORY_FILE to a path to opt in; when unset, behaviour is byte-for-byte unchanged (in-RAM, reset per request).
- Loads any previously saved items on init and writes after every
add(). - Atomic writes (temp file +
os.replace) so a crash mid-write can't corrupt the store. - A missing/corrupt/non-list file degrades gracefully to an empty store — persistence can never fail a request. Loaded items are filtered to strings and bounded by
max_size.
CI: Frame SAST scan of PR-changed files
Added a Security Scan (Frame SAST) workflow that runs the Frame neuro-symbolic SAST tool (pinned) on the Python files changed by each pull request. Scanning only changed files surfaces newly introduced issues without failing on pre-existing findings; the job fails only on high/critical severity.
Full Changelog: v0.3.17...v0.3.18
v0.3.17
What's Changed
Fix: MCTS parameters are now request-scoped (#304, #314)
proxy() previously wrote each request's mcts_depth / mcts_exploration / mcts_simulations into the module-level server_config, and execute_single_approach() read them back out of that same global. Under any threaded WSGI server (Flask's default threaded mode, gunicorn, uWSGI, Hypercorn), two concurrent requests raced on that global — request B could overwrite request A's params between A's write and A's read, silently running A's MCTS with B's settings. The write also leaked one request's params into every later request.
MCTS params now flow through the per-request request_config (with the server/CLI defaults as fallback), consistent with every other approach parameter. Behaviour is unchanged for all callers; the shared global is no longer mutated per request, so there is nothing left to race on.
Includes regression tests covering both the per-request read path and the proxy no-mutation guarantee.
Full Changelog: v0.3.16...v0.3.17
v0.3.16
What's Changed
- Guard against empty/None/truncated provider responses in rto, self_consistency, reread, leap by @SuperMarioYL in #311
New Contributors
- @SuperMarioYL made their first contribution in #311
Full Changelog: v0.3.15...v0.3.16
v0.3.15
v0.3.14
v0.3.13
What's Changed
- Improve CePO capability by @ohpauleez in #290
- Fix macOS MPS compatibility and bump version to 0.3.13 by @codelion in #294
New Contributors
- @ohpauleez made their first contribution in #290
Full Changelog: v0.3.12...v0.3.13