Releases · algorithmicsuperintelligence/optillm

What's Changed

Fix: runaway local generation for models whose ChatML end token differs from their tokenizer EOS (#317)

Running such a model through optillm's built-in local inference never stopped early and generated the full max_new_tokens default (4096) on every call that omitted max_tokens — e.g. dhara-250m's chat ends at <|im_end|> (49154) but its tokenizer eos_token is <|end_of_text|> (1), so it rambled ~800s/call.

Honor generation_config.eos_token_id: generation now resolves EOS from the model's own generation config (merging the tokenizer EOS as a fallback) instead of hardcoding tokenizer.eos_token_id. Fixes any chat model with this mismatch; well-behaved models (e.g. Qwen2.5, whose tokenizer EOS already is <|im_end|>) are unchanged.
OPTILLM_MAX_TOKENS: new env var to override the default max_new_tokens (default stays 4096), bounding any request that sends no max_tokens. An explicit request max_tokens still takes precedence. Useful for small models that don't reliably emit an EOS token.

Only the local inference engine (OPTILLM_API_KEY=optillm) is affected; proxied providers handle stopping themselves. Includes unit tests for both helpers.

Full Changelog: v0.3.19...v0.3.20

What's Changed

Fix: import optillm crash on macOS with transformers 5.13 (#316)

A fresh install on Apple silicon crashed with AttributeError: 'str' object has no attribute '__module__' at import optillm.

Root cause: transformers 5.13.0 tightened AutoTokenizer.register() to require a class, but mlx-lm 0.31.3 registers a custom tokenizer by string name (tokenizer_utils.py: AutoTokenizer.register("NewlineTokenizer", ...)). transformers ≤5.12.1 tolerated it.
macOS-only: optillm installs mlx-lm only on Apple silicon (platform_machine=="arm64" and sys_platform=="darwin"); Linux CI never installs it, so CI was unaffected.
Fix: pin transformers>=5.0.0,<5.13.0 (resolves to 5.12.1). The transformers 5.13.0 HF path itself is fine (dhara loads and generates); the incompatibility is purely mlx-lm's. Lift the cap once mlx-lm ships a transformers-5.13-compatible release.

Full Changelog: v0.3.18...v0.3.19

What's Changed

Opt-in file persistence for the memory plugin (#111, #315)

The memory plugin can now persist extracted memories across requests. Set OPTILLM_MEMORY_FILE to a path to opt in; when unset, behaviour is byte-for-byte unchanged (in-RAM, reset per request).

Loads any previously saved items on init and writes after every add().
Atomic writes (temp file + os.replace) so a crash mid-write can't corrupt the store.
A missing/corrupt/non-list file degrades gracefully to an empty store — persistence can never fail a request. Loaded items are filtered to strings and bounded by max_size.

CI: Frame SAST scan of PR-changed files

Added a Security Scan (Frame SAST) workflow that runs the Frame neuro-symbolic SAST tool (pinned) on the Python files changed by each pull request. Scanning only changed files surfaces newly introduced issues without failing on pre-existing findings; the job fails only on high/critical severity.

Full Changelog: v0.3.17...v0.3.18

What's Changed

Fix: MCTS parameters are now request-scoped (#304, #314)

proxy() previously wrote each request's mcts_depth / mcts_exploration / mcts_simulations into the module-level server_config, and execute_single_approach() read them back out of that same global. Under any threaded WSGI server (Flask's default threaded mode, gunicorn, uWSGI, Hypercorn), two concurrent requests raced on that global — request B could overwrite request A's params between A's write and A's read, silently running A's MCTS with B's settings. The write also leaked one request's params into every later request.

MCTS params now flow through the per-request request_config (with the server/CLI defaults as fallback), consistent with every other approach parameter. Behaviour is unchanged for all callers; the shared global is no longer mutated per request, so there is nothing left to race on.

Includes regression tests covering both the per-request read path and the proxy no-mutation guarantee.

Full Changelog: v0.3.16...v0.3.17

@SuperMarioYL

What's Changed

Guard against empty/None/truncated provider responses in rto, self_consistency, reread, leap by @SuperMarioYL in #311

New Contributors

@SuperMarioYL made their first contribution in #311

Full Changelog: v0.3.15...v0.3.16

@codelion

What's Changed

Fix z3 solver arm64 build by @codelion in #298
Add compact plugin for auto context compression by @GoDiao in #305

New Contributors

@GoDiao made their first contribution in #305

Full Changelog: v0.3.14...v0.3.15

@codelion

What's Changed

Fix spacy version constraint by @codelion in #297

Full Changelog: v0.3.13...v0.3.14

@ohpauleez

What's Changed

Improve CePO capability by @ohpauleez in #290
Fix macOS MPS compatibility and bump version to 0.3.13 by @codelion in #294

New Contributors

@ohpauleez made their first contribution in #290

Full Changelog: v0.3.12...v0.3.13

@codelion

What's Changed

Fix default addres binding by @codelion in #288

Full Changelog: v0.3.11...v0.3.12

@codelion

What's Changed

Feat fix web search by @codelion in #284

Full Changelog: v0.3.10...v0.3.11

Uh oh!

Releases: algorithmicsuperintelligence/optillm

Release list

v0.3.20

What's Changed

Uh oh!

v0.3.19

What's Changed

Uh oh!

v0.3.18

What's Changed

Uh oh!

v0.3.17

What's Changed

Uh oh!

v0.3.16

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.15

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.14

What's Changed

Contributors

Uh oh!

v0.3.13

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.12

What's Changed

Contributors

Uh oh!

v0.3.11

What's Changed

Contributors

Uh oh!