Web API and websocket for Large Language Models in C++
- clone the repo and cd into it:
git clone https://github.com/monatis/llm-api.git && cd llm-api
- Install
asio
for the web API.
apt install libasio-dev
Note: You can also run scripts/install-dev.sh
to install asio
(and websocat additionally, in order to test websocket on the terminal).
- Build with cmake and make:
mkdir build && cd build
cmake -DLLM_NATIVE=ON ..
make -j4
Find the executible in ./bin/llm-api
.
- Download gpt4all-j model if you haven't already:
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O ./bin/ggml-gpt4all-j.bin
- Run the executible:
./bin/llm-api
Note: You can pass the model path with -m
argument if it's located elsewhere. See below for more options.
./bin/llm-api -h
usage: ./bin/llm-api [options]
options:
-h, --help show this help message and exit
-v, --verbose log generation in stdout (default: disabled)
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
--port PORT port to listen on (default: 8080)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 200)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--temp N temperature (default: 0.9)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: ggml-gpt4all-j.bin)
- Improve multi-user experience
- Integrate StableLM model.
- Add embedding endpoint.
- Provide a chain mechanism.
- Integrate a chat UI.
- Add Docker support.
- Extend readme and docs