Chat completion with LLMs via Ollama.
Spring AI provides a ChatModel
low-level abstraction for integrating with LLMs via several providers, including Ollama.
When using the Spring AI Ollama Spring Boot Starter, a ChatModel
object is autoconfigured for you to use Ollama.
@Bean
CommandLineRunner chat(ChatModel chatModel) {
return _ -> {
var response = chatModel.call("What is the capital of Italy?");
System.out.println(response);
};
}
Spring AI also provides a higher-level abstraction for building more advanced LLM workflows: ChatClient
.
A ChatClient.Builder
object is autoconfigured for you to build a ChatClient
object. Under the hood, it relies on a ChatModel
.
@Bean
CommandLineRunner chat(ChatClient.Builder chatClientBuilder) {
var chatClient = chatClientBuilder.build();
return _ -> {
var response = chatClient
.prompt("What is the capital of Italy?")
.call()
.content();
System.out.println(response);
};
}
The application consumes models from an Ollama inference server. You can either run Ollama locally on your laptop, or rely on the Testcontainers support in Spring Boot to spin up an Ollama service automatically. If you choose the first option, make sure you have Ollama installed and running on your laptop. Either way, Spring AI will take care of pulling the needed Ollama models when the application starts, if they are not available yet on your machine.
If you're using the native Ollama application, run the application as follows.
./gradlew bootRun
If you want to rely on the native Testcontainers support in Spring Boot to spin up an Ollama service at startup time, run the application as follows.
./gradlew bootTestRun
Note
These examples use the httpie CLI to send HTTP requests.
Call the application that will use a chat model to answer your question.
http :8080/chat question=="What is the capital of Italy?" -b
The next request is configured with generic portable options.
http :8080/chat/generic-options question=="Why is a raven like a writing desk? Give a short answer." -b
The next request is configured with the provider's specific options.
http :8080/chat/provider-options question=="What can you see beyond what you can see? Give a short answer." -b
The final request returns the model's answer as a stream.
http --stream :8080/chat/stream question=="Why is a raven like a writing desk? Answer in 3 paragraphs." -b
Ollama lets you run models directly from Hugging Face. Let's try that out.
http :8080/chat/huggingface question=="Why is a raven like a writing desk? Give a short answer." -b