v0.2.0
Concurrency
Ollama 0.2.0 is now available with concurrency support. This unlocks 2 specific features:
Parallel requests
Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases such as:
- Handling multiple chat sessions at the same time
- Hosting a code completion LLM for your internal team
- Processing different parts of a document simultaneously
- Running several agents at the same time.
demo.mov
Multiple models
Ollama now supports loading different models at the same time, dramatically improving:
- Retrieval Augmented Generation (RAG): both the embedding and text completion models can be loaded into memory simultaneously.
- Agents: multiple different agents can now run simultaneously
- Running large and small models side-by-side
Models are automatically loaded and unloaded based on requests and how much GPU memory is available.
To see which models are loaded, run ollama ps
:
% ollama ps
NAME ID SIZE PROCESSOR UNTIL
gemma:2b 030ee63283b5 2.8 GB 100% GPU 4 minutes from now
all-minilm:latest 1b226e2802db 530 MB 100% GPU 4 minutes from now
llama3:latest 365c0bd3c000 6.7 GB 100% GPU 4 minutes from now
For more information on concurrency, see the FAQ
New models
- GLM-4: A strong multi-lingual general language model with competitive performance to Llama 3.
- CodeGeeX4: A versatile model for AI software development scenarios, including code completion.
- Gemma 2: Improved output quality and base text generation models now available
What's Changed
- Improved Gemma 2
- Fixed issue where model would generate invalid tokens after hitting context window
- Fixed inference output issues with
gemma2:27b
- Re-downloading the model may be required:
ollama pull gemma2
orollama pull gemma2:27b
- Ollama will now show a better error if a model architecture isn't supported
- Improved handling of quotes and spaces in Modelfile
FROM
lines - Ollama will now return an error if the system does not have enough memory to run a model on Linux
New Contributors
- @Muku784 made their first contribution in #5382
- @abitrolly made their first contribution in #4821
Full Changelog: v0.1.48...v0.2.0