Skip to content

Latest commit

 

History

History

chat-ollama

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Chat Models: Ollama

Chat completion with LLMs via Ollama.

Description

Spring AI provides a ChatModel low-level abstraction for integrating with LLMs via several providers, including Ollama.

When using the Spring AI Ollama Spring Boot Starter, a ChatModel object is autoconfigured for you to use Ollama.

@Bean
CommandLineRunner chat(ChatModel chatModel) {
    return _ -> {
        var response = chatModel.call("What is the capital of Italy?");
        System.out.println(response);
    };
}

Spring AI also provides a higher-level abstraction for building more advanced LLM workflows: ChatClient. A ChatClient.Builder object is autoconfigured for you to build a ChatClient object. Under the hood, it relies on a ChatModel.

@Bean
CommandLineRunner chat(ChatClient.Builder chatClientBuilder) {
    var chatClient = chatClientBuilder.build();
    return _ -> {
        var response = chatClient
                .prompt("What is the capital of Italy?")
                .call()
                .content();
        System.out.println(response);
    };
}

Ollama

The application consumes models from an Ollama inference server. You can either run Ollama locally on your laptop, or rely on the Testcontainers support in Spring Boot to spin up an Ollama service automatically. If you choose the first option, make sure you have Ollama installed and running on your laptop. Either way, Spring AI will take care of pulling the needed Ollama models when the application starts, if they are not available yet on your machine.

Running the application

If you're using the native Ollama application, run the application as follows.

./gradlew bootRun

If you want to rely on the native Testcontainers support in Spring Boot to spin up an Ollama service at startup time, run the application as follows.

./gradlew bootTestRun

Calling the application

Note

These examples use the httpie CLI to send HTTP requests.

Call the application that will use a chat model to answer your question.

http :8080/chat question=="What is the capital of Italy?" -b

The next request is configured with generic portable options.

http :8080/chat/generic-options question=="Why is a raven like a writing desk? Give a short answer." -b

The next request is configured with the provider's specific options.

http :8080/chat/provider-options question=="What can you see beyond what you can see? Give a short answer." -b

The final request returns the model's answer as a stream.

http --stream :8080/chat/stream question=="Why is a raven like a writing desk? Answer in 3 paragraphs." -b

Ollama lets you run models directly from Hugging Face. Let's try that out.

http :8080/chat/huggingface question=="Why is a raven like a writing desk? Give a short answer." -b