luaLLM

Drop a folder full of .gguf model files, point luaLLM at it, and start using them immediately. No flag fiddling — sensible defaults work out of the box. Pick a model from the interactive TUI or type a partial name. When you need more speed, the built-in benchmark and recommendation engine will find your optimal settings automatically.

A fast, scriptable command-line tool for managing, running, benchmarking, and optimizing local LLMs. Built in Lua on top of llama.cpp.

Features

Running Models

Interactive TUI picker with pinned + recent model sections
Foreground and background (daemon) mode with multi-server support
Fuzzy model name matching — no need to type exact filenames
Saved presets (--preset throughput, --preset cold-start) for optimized launches

Model Management

Pin models for quick access at the top of the picker
Freeform markdown notes per model with timestamped entries
Cached model metadata (context size, quantization, rope settings)
Multi-part GGUF merging for split model files

Benchmarking & Optimization

Automated benchmark sweeps with llama-bench (configurable runs, warmup, threads)
Side-by-side model comparison with delta/percentage stats and comparability warnings
recommend throughput — sweeps thread counts, KV cache types, and flash attention to find the fastest flags
recommend cold-start — generates a preset optimized for minimum time-to-first-token

Utilities

--json output on list, status, notes, help for scripting and tool integration
Configuration diagnostics (doctor) to validate your setup
Rebuild llama.cpp from source with one command

Quick Start

# 1. Install
brew install lua luarocks        # macOS
make install

# 2. Configure (auto-created on first run)
luallm config                    # shows config file path — edit it to set your models_dir and llama_cpp_path

# 3. Run
luallm                           # interactive picker
luallm mistral                   # run by name (fuzzy match)
luallm start mistral             # launch as background daemon

Installation

Prerequisites

Lua 5.3+ and luarocks
llama.cpp binaries — at minimum llama-server; also llama-bench (for benchmarking), llama-gguf-split (for join), llama-cli (optional)
One or more .gguf model files

# macOS
brew install lua luarocks

# Ubuntu / Debian
sudo apt install lua5.4 liblua5.4-dev luarocks

Install with Make

make install                # installs Lua deps and symlinks to ~/.local/bin/luallm

Make sure ~/.local/bin is in your PATH (it is by default on most systems).

For system-wide installation:

sudo make install PREFIX=/usr/local

Manual Install

make deps                         # install lua-cjson and luafilesystem via luarocks
chmod +x luallm.lua
mkdir -p ~/.local/bin
ln -sf $(pwd)/luallm.lua ~/.local/bin/luallm

Note: luallm.lua must be symlinked, not copied — it resolves src/ module imports relative to the script location.

Verify

make check                       # checks that Lua dependencies are installed
luallm doctor                    # validates config, binaries, and models directory

Luarocks Path

If you get "module not found" errors, add this to your shell config (~/.bashrc or ~/.zshrc):

eval $(luarocks path --bin)

Usage

Running Models

luallm                           # interactive picker (pinned + recent models)
luallm <model>                   # fuzzy-match a model name, run in foreground
luallm run <model>               # explicit run command (same as above)
luallm run <model> --preset throughput   # run with saved optimized flags
luallm <model> --port 9090 -c 8192      # pass extra llama.cpp flags (override defaults)

Model names are fuzzy-matched — mistral matches mistral-7b-instruct-v0.3.Q4_K_M.gguf. If multiple models match, the picker appears.

Daemon Mode

Run models as background servers that persist after you close the terminal:

luallm start <model>             # launch in background, returns immediately
luallm start <model> --port 8081 # run on a specific port
luallm start <model> --preset throughput

luallm status                    # show running and recently stopped servers
luallm logs <model>              # view daemon output
luallm logs <model> --follow     # tail logs in real time
luallm stop <model>              # stop a server (SIGTERM, then SIGKILL)
luallm stop all                  # stop everything

Multiple servers can run simultaneously on different ports. State is tracked in ~/.cache/luallm/state.json.

Model Management

luallm list                      # list all models (name, size, quantization, last run)
luallm list --json               # JSON output for scripting

luallm info <model>              # show cached metadata (context size, quant, rope, etc.)
luallm info <model> --kv         # show full GGUF key-value pairs
luallm info <model> --raw        # show raw captured llama.cpp output

luallm pin <model>               # pin a model (appears at top of picker)
luallm unpin <model>
luallm pinned                    # list pinned models

luallm notes <model>             # view notes for a model
luallm notes add <model> "great for coding, fast on 8 threads"
luallm notes edit <model>        # open in $EDITOR
luallm notes list                # list all models with notes

luallm join                      # merge multi-part GGUF files (ModelName-00001-of-00003.gguf -> ModelName.gguf)
luallm join llama-405b           # merge a specific model

Metadata is captured automatically the first time you run a model and cached locally.

Benchmarking & Optimization

luallm bench <model>             # run benchmark sweep (5 runs, 1 warmup)
luallm bench <model> --n 10      # 10 measured runs
luallm bench <model> --warmup 2  # 2 warmup runs
luallm bench <model> --threads 8

luallm bench show                # view saved results (picker)
luallm bench show <model>        # view results for a specific model
luallm bench compare <A> <B>     # side-by-side comparison with deltas
luallm bench compare <A> <B> --verbose   # include hardware and build details
luallm bench clear               # delete all saved benchmark data

Preset Recommendations:

luallm recommend throughput <model>   # benchmark sweep across threads, KV cache types, flash attention
                                      # finds the fastest flags and saves them as a preset

luallm recommend cold-start <model>   # generates a preset optimized for fast model loading
                                      # (small context, reduced GPU layers, no flash attention)

luallm run <model> --preset throughput # use the saved preset
luallm start <model> --preset cold-start

The throughput profile tests combinations of thread counts, KV cache quantization (q8_0, q4_0), and flash attention, then saves the winner. The cold-start profile generates a static preset tuned for minimum time-to-first-token.

Maintenance

luallm doctor                    # validate config, check binaries, count models
luallm config                    # show config file path
luallm rebuild                   # git pull + rebuild llama.cpp from source
luallm clear-history             # reset run history
luallm test                      # run the test suite

Help

luallm help                      # overview of all commands
luallm help <command>            # detailed help for a specific command
luallm help --json               # structured JSON output (for tool integration)

Configuration

On first run, a config file is created at ~/.config/luaLLM/config.json. See config.example.json for a full example.

{
  "llama_cpp_path": "/usr/local/bin/llama-server",
  "llama_bench_path": "/usr/local/bin/llama-bench",
  "llama_cli_path": "/usr/local/bin/llama-cli",
  "llama_cpp_source_dir": "/home/user/llama.cpp",
  "models_dir": "/home/user/models",
  "recent_models_count": 7,
  "default_port": 8080,
  "default_params": ["-c 4096", "--host 127.0.0.1", "--threads 8"],
  "bench": {
    "default_n": 5,
    "default_warmup": 1,
    "default_ctx": 2048,
    "default_gen": 256,
    "default_batch": 512,
    "default_threads": 8
  },
  "cmake_options": [
    "-DGGML_METAL=ON",
    "-DGGML_METAL_EMBED_LIBRARY=ON",
    "-DGGML_USE_FLASH_ATTENTION=ON",
    "-DLLAMA_BUILD_SERVER=ON",
    "-DCMAKE_BUILD_TYPE=Release"
  ],
  "model_overrides": {
    "codellama": ["-c 16384", "--gpu-layers 35"],
    "llama%-3.*70b": ["-c 8192", "--gpu-layers 40", "--threads 16"],
    "mistral": ["-c 8192"]
  }
}

Configuration Reference

Key	Required	Description
`llama_cpp_path`	Yes	Path to `llama-server` binary
`models_dir`	Yes	Directory containing `.gguf` model files
`default_params`	No	Default flags passed to llama-server on every run
`default_port`	No	Default port for llama-server (default: 8080)
`recent_models_count`	No	Number of recent models shown in picker (default: 4)
`llama_bench_path`	No	Path to `llama-bench` (auto-resolved from `llama_cpp_path` if not set)
`llama_cli_path`	No	Path to `llama-cli`
`llama_cpp_source_dir`	No	Path to llama.cpp source for `rebuild` command
`cmake_options`	No	CMake flags used by `rebuild`
`bench.default_n`	No	Default number of benchmark runs
`bench.default_warmup`	No	Default warmup runs before measuring
`bench.default_threads`	No	Default thread count for benchmarking
`bench.default_ctx`	No	Default context size for benchmarks
`bench.default_gen`	No	Default generation length for benchmarks
`bench.default_batch`	No	Default batch size for benchmarks
`model_overrides`	No	Pattern-based per-model parameter overrides (see below)

Model Overrides

The model_overrides keys are Lua patterns matched against model filenames. Any model whose filename contains the pattern will use those additional flags:

{
  "model_overrides": {
    "codellama": ["-c 16384"],
    "llama%-3": ["--gpu-layers 35"],
    "mistral.*7b": ["-c 8192"]
  }
}

Both codellama-7b-v1.5.gguf and codellama-13b-v2.gguf match "codellama". Note the %- escape — - is a special character in Lua patterns.

Command Reference

Command	Description
`luallm`	Interactive picker (pinned + recent models)
`luallm <model>`	Fuzzy-match and run a model
`luallm run <model>`	Run with optional `--preset`
`luallm start <model>`	Start as background daemon
`luallm stop <model\|all>`	Stop running server(s)
`luallm status`	Show server status
`luallm logs <model>`	View daemon logs (`--follow`)
`luallm list`	List all models (`--json`)
`luallm info <model>`	Show model metadata (`--kv`, `--raw`)
`luallm pin/unpin <model>`	Pin or unpin a model
`luallm pinned`	List pinned models
`luallm notes <model>`	View/add/edit model notes
`luallm bench <model>`	Run benchmarks
`luallm bench show/compare`	View or compare benchmark results
`luallm recommend <profile>`	Generate optimized presets
`luallm join`	Merge multi-part GGUF files
`luallm doctor`	Run diagnostics
`luallm config`	Show config file path
`luallm rebuild`	Rebuild llama.cpp from source
`luallm clear-history`	Clear run history
`luallm help [command]`	Show help (`--json`)
`luallm test`	Run test suite

Data Locations

Path	Contents
`~/.config/luaLLM/config.json`	Configuration
`~/.config/luaLLM/history.json`	Run history
`~/.config/luaLLM/pins.json`	Pinned models
`~/.config/luaLLM/model_info/`	Cached model metadata
`~/.config/luaLLM/notes/`	Per-model markdown notes
`~/.config/luaLLM/bench/`	Benchmark results
`~/.cache/luallm/state.json`	Server state (running/stopped)
`~/.cache/luallm/logs/`	Daemon log files
`~/.cache/luallm/pids/`	Daemon PID files

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
prompts/bench		prompts/bench
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.example.json		config.example.json
luallm.lua		luallm.lua
measure_ready.sh		measure_ready.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

luaLLM

Features

Quick Start

Installation

Prerequisites

Install with Make

Manual Install

Verify

Luarocks Path

Usage

Running Models

Daemon Mode

Model Management

Benchmarking & Optimization

Maintenance

Help

Configuration

Configuration Reference

Model Overrides

Command Reference

Data Locations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

luaLLM

Features

Quick Start

Installation

Prerequisites

Install with Make

Manual Install

Verify

Luarocks Path

Usage

Running Models

Daemon Mode

Model Management

Benchmarking & Optimization

Maintenance

Help

Configuration

Configuration Reference

Model Overrides

Command Reference

Data Locations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages