Drop a folder full of .gguf model files, point luaLLM at it, and start using them immediately. No flag fiddling — sensible defaults work out of the box. Pick a model from the interactive TUI or type a partial name. When you need more speed, the built-in benchmark and recommendation engine will find your optimal settings automatically.
A fast, scriptable command-line tool for managing, running, benchmarking, and optimizing local LLMs. Built in Lua on top of llama.cpp.
Running Models
- Interactive TUI picker with pinned + recent model sections
- Foreground and background (daemon) mode with multi-server support
- Fuzzy model name matching — no need to type exact filenames
- Saved presets (
--preset throughput,--preset cold-start) for optimized launches
Model Management
- Pin models for quick access at the top of the picker
- Freeform markdown notes per model with timestamped entries
- Cached model metadata (context size, quantization, rope settings)
- Multi-part GGUF merging for split model files
Benchmarking & Optimization
- Automated benchmark sweeps with
llama-bench(configurable runs, warmup, threads) - Side-by-side model comparison with delta/percentage stats and comparability warnings
recommend throughput— sweeps thread counts, KV cache types, and flash attention to find the fastest flagsrecommend cold-start— generates a preset optimized for minimum time-to-first-token
Utilities
--jsonoutput onlist,status,notes,helpfor scripting and tool integration- Configuration diagnostics (
doctor) to validate your setup - Rebuild llama.cpp from source with one command
# 1. Install
brew install lua luarocks # macOS
make install
# 2. Configure (auto-created on first run)
luallm config # shows config file path — edit it to set your models_dir and llama_cpp_path
# 3. Run
luallm # interactive picker
luallm mistral # run by name (fuzzy match)
luallm start mistral # launch as background daemon- Lua 5.3+ and luarocks
- llama.cpp binaries — at minimum
llama-server; alsollama-bench(for benchmarking),llama-gguf-split(for join),llama-cli(optional) - One or more
.ggufmodel files
# macOS
brew install lua luarocks
# Ubuntu / Debian
sudo apt install lua5.4 liblua5.4-dev luarocksmake install # installs Lua deps and symlinks to ~/.local/bin/luallmMake sure ~/.local/bin is in your PATH (it is by default on most systems).
For system-wide installation:
sudo make install PREFIX=/usr/localmake deps # install lua-cjson and luafilesystem via luarocks
chmod +x luallm.lua
mkdir -p ~/.local/bin
ln -sf $(pwd)/luallm.lua ~/.local/bin/luallmNote:
luallm.luamust be symlinked, not copied — it resolvessrc/module imports relative to the script location.
make check # checks that Lua dependencies are installed
luallm doctor # validates config, binaries, and models directoryIf you get "module not found" errors, add this to your shell config (~/.bashrc or ~/.zshrc):
eval $(luarocks path --bin)luallm # interactive picker (pinned + recent models)
luallm <model> # fuzzy-match a model name, run in foreground
luallm run <model> # explicit run command (same as above)
luallm run <model> --preset throughput # run with saved optimized flags
luallm <model> --port 9090 -c 8192 # pass extra llama.cpp flags (override defaults)Model names are fuzzy-matched — mistral matches mistral-7b-instruct-v0.3.Q4_K_M.gguf. If multiple models match, the picker appears.
Run models as background servers that persist after you close the terminal:
luallm start <model> # launch in background, returns immediately
luallm start <model> --port 8081 # run on a specific port
luallm start <model> --preset throughput
luallm status # show running and recently stopped servers
luallm logs <model> # view daemon output
luallm logs <model> --follow # tail logs in real time
luallm stop <model> # stop a server (SIGTERM, then SIGKILL)
luallm stop all # stop everythingMultiple servers can run simultaneously on different ports. State is tracked in ~/.cache/luallm/state.json.
luallm list # list all models (name, size, quantization, last run)
luallm list --json # JSON output for scripting
luallm info <model> # show cached metadata (context size, quant, rope, etc.)
luallm info <model> --kv # show full GGUF key-value pairs
luallm info <model> --raw # show raw captured llama.cpp output
luallm pin <model> # pin a model (appears at top of picker)
luallm unpin <model>
luallm pinned # list pinned models
luallm notes <model> # view notes for a model
luallm notes add <model> "great for coding, fast on 8 threads"
luallm notes edit <model> # open in $EDITOR
luallm notes list # list all models with notes
luallm join # merge multi-part GGUF files (ModelName-00001-of-00003.gguf -> ModelName.gguf)
luallm join llama-405b # merge a specific modelMetadata is captured automatically the first time you run a model and cached locally.
luallm bench <model> # run benchmark sweep (5 runs, 1 warmup)
luallm bench <model> --n 10 # 10 measured runs
luallm bench <model> --warmup 2 # 2 warmup runs
luallm bench <model> --threads 8
luallm bench show # view saved results (picker)
luallm bench show <model> # view results for a specific model
luallm bench compare <A> <B> # side-by-side comparison with deltas
luallm bench compare <A> <B> --verbose # include hardware and build details
luallm bench clear # delete all saved benchmark dataPreset Recommendations:
luallm recommend throughput <model> # benchmark sweep across threads, KV cache types, flash attention
# finds the fastest flags and saves them as a preset
luallm recommend cold-start <model> # generates a preset optimized for fast model loading
# (small context, reduced GPU layers, no flash attention)
luallm run <model> --preset throughput # use the saved preset
luallm start <model> --preset cold-startThe throughput profile tests combinations of thread counts, KV cache quantization (q8_0, q4_0), and flash attention, then saves the winner. The cold-start profile generates a static preset tuned for minimum time-to-first-token.
luallm doctor # validate config, check binaries, count models
luallm config # show config file path
luallm rebuild # git pull + rebuild llama.cpp from source
luallm clear-history # reset run history
luallm test # run the test suiteluallm help # overview of all commands
luallm help <command> # detailed help for a specific command
luallm help --json # structured JSON output (for tool integration)On first run, a config file is created at ~/.config/luaLLM/config.json. See config.example.json for a full example.
{
"llama_cpp_path": "/usr/local/bin/llama-server",
"llama_bench_path": "/usr/local/bin/llama-bench",
"llama_cli_path": "/usr/local/bin/llama-cli",
"llama_cpp_source_dir": "/home/user/llama.cpp",
"models_dir": "/home/user/models",
"recent_models_count": 7,
"default_port": 8080,
"default_params": ["-c 4096", "--host 127.0.0.1", "--threads 8"],
"bench": {
"default_n": 5,
"default_warmup": 1,
"default_ctx": 2048,
"default_gen": 256,
"default_batch": 512,
"default_threads": 8
},
"cmake_options": [
"-DGGML_METAL=ON",
"-DGGML_METAL_EMBED_LIBRARY=ON",
"-DGGML_USE_FLASH_ATTENTION=ON",
"-DLLAMA_BUILD_SERVER=ON",
"-DCMAKE_BUILD_TYPE=Release"
],
"model_overrides": {
"codellama": ["-c 16384", "--gpu-layers 35"],
"llama%-3.*70b": ["-c 8192", "--gpu-layers 40", "--threads 16"],
"mistral": ["-c 8192"]
}
}| Key | Required | Description |
|---|---|---|
llama_cpp_path |
Yes | Path to llama-server binary |
models_dir |
Yes | Directory containing .gguf model files |
default_params |
No | Default flags passed to llama-server on every run |
default_port |
No | Default port for llama-server (default: 8080) |
recent_models_count |
No | Number of recent models shown in picker (default: 4) |
llama_bench_path |
No | Path to llama-bench (auto-resolved from llama_cpp_path if not set) |
llama_cli_path |
No | Path to llama-cli |
llama_cpp_source_dir |
No | Path to llama.cpp source for rebuild command |
cmake_options |
No | CMake flags used by rebuild |
bench.default_n |
No | Default number of benchmark runs |
bench.default_warmup |
No | Default warmup runs before measuring |
bench.default_threads |
No | Default thread count for benchmarking |
bench.default_ctx |
No | Default context size for benchmarks |
bench.default_gen |
No | Default generation length for benchmarks |
bench.default_batch |
No | Default batch size for benchmarks |
model_overrides |
No | Pattern-based per-model parameter overrides (see below) |
The model_overrides keys are Lua patterns matched against model filenames. Any model whose filename contains the pattern will use those additional flags:
{
"model_overrides": {
"codellama": ["-c 16384"],
"llama%-3": ["--gpu-layers 35"],
"mistral.*7b": ["-c 8192"]
}
}Both codellama-7b-v1.5.gguf and codellama-13b-v2.gguf match "codellama". Note the %- escape — - is a special character in Lua patterns.
| Command | Description |
|---|---|
luallm |
Interactive picker (pinned + recent models) |
luallm <model> |
Fuzzy-match and run a model |
luallm run <model> |
Run with optional --preset |
luallm start <model> |
Start as background daemon |
luallm stop <model|all> |
Stop running server(s) |
luallm status |
Show server status |
luallm logs <model> |
View daemon logs (--follow) |
luallm list |
List all models (--json) |
luallm info <model> |
Show model metadata (--kv, --raw) |
luallm pin/unpin <model> |
Pin or unpin a model |
luallm pinned |
List pinned models |
luallm notes <model> |
View/add/edit model notes |
luallm bench <model> |
Run benchmarks |
luallm bench show/compare |
View or compare benchmark results |
luallm recommend <profile> |
Generate optimized presets |
luallm join |
Merge multi-part GGUF files |
luallm doctor |
Run diagnostics |
luallm config |
Show config file path |
luallm rebuild |
Rebuild llama.cpp from source |
luallm clear-history |
Clear run history |
luallm help [command] |
Show help (--json) |
luallm test |
Run test suite |
| Path | Contents |
|---|---|
~/.config/luaLLM/config.json |
Configuration |
~/.config/luaLLM/history.json |
Run history |
~/.config/luaLLM/pins.json |
Pinned models |
~/.config/luaLLM/model_info/ |
Cached model metadata |
~/.config/luaLLM/notes/ |
Per-model markdown notes |
~/.config/luaLLM/bench/ |
Benchmark results |
~/.cache/luallm/state.json |
Server state (running/stopped) |
~/.cache/luallm/logs/ |
Daemon log files |
~/.cache/luallm/pids/ |
Daemon PID files |