The Rivet Ollama Plugin is a plugin for Rivet to allow you to use Ollama to run and chat with LLMs locally and easily. It adds the following nodes:
- Ollama Chat
- Ollama Embedding
- Get Ollama Model
- List Ollama Models
- Pull Model to Ollama
Table of Contents
To run Ollama so that Rivet's default browser executor can communicate with it, you will want to start it with the following command:
OLLAMA_ORIGINS=* ollama serve
If you are using the node executor, you can omit the OLLAMA_ORIGINS
environment variable.
To use this plugin in Rivet:
- Open the plugins overlay at the top of the screen.
- Search for "rivet-plugin-ollama"
- Click the "Add" button to install the plugin into your current project.
-
Import the plugin and Rivet into your project:
import * as Rivet from "@ironclad/rivet-node"; import RivetPluginOllama from "rivet-plugin-ollama";
-
Initialize the plugin and register the nodes with the
globalRivetNodeRegistry
:Rivet.globalRivetNodeRegistry.registerPlugin(RivetPluginOllama(Rivet));
(You may also use your own node registry if you wish, instead of the global one.)
-
The nodes will now work when ran with
runGraphInFile
orcreateProcessor
.
By default, the plugin will attempt to connect to Ollama at http://localhost:11434
. If you would like you change this, you can open the Settings window, navigate to the Plugins area, and you will see a Host
setting for Ollama. You can change this to the URL of your Ollama instance. For some users it works using http://127.0.0.1:11434
instead.
When using the SDK, you can pass a host
option to the plugin to configure the host:
Using createProcessor
or runGraphInFile
, pass in via pluginSettings
in RunGraphOptions
:
await createProcessor(project, {
...etc,
pluginSettings: {
ollama: {
host: "http://localhost:11434",
},
},
});
The main node of the plugin. Functions similarly to the Chat Node built in to Rivet. Uses /api/chat route
Title | Data Type | Description | Default Value | Notes |
---|---|---|---|---|
System Prompt | string |
The system prompt to prepend to the messages list. | (none) | Optional. |
Messages | 'chat-message[]' | The chat messages to use as the prompt for the LLM. | (none) | Chat messages are converted to the OpenAI message format using "role" and "content" keys |
Title | Data Type | Description | Notes |
---|---|---|---|
Output | string |
The response text from the LLM. | |
Messages Sent | chat-message[] |
The messages that were sent to Ollama. | |
All Messages | chat-message[] |
All messages, including the reply from the LLM. |
Setting | Description | Default Value | Use Input Toggle | Input Data Type |
---|---|---|---|---|
Model | The name of the LLM model in to use in Ollama. | (Empty) | Yes | string |
Prompt Format | The way to format chat messages for the prompt being sent to the ollama model. Raw means no formatting is applied. Llama 2 Instruct follows the Llama 2 prompt format. | Llama 2 Instruct | No | N/A |
JSON Mode | Activates JSON output mode | false | Yes | boolean |
Parameters Group | ||||
Mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | (unset) | Yes | number |
Mirostat Eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) | (unset) | Yes | number |
Mirostat Tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) | (unset) | Yes | number |
Num Ctx | Sets the size of the context window used to generate the next token. (Default: 2048) | (unset) | Yes | number |
Num GQA | The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b | (unset) | Yes | number |
Num GPUs | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. | (unset) | Yes | number |
Num Threads | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | (unset) | Yes | number |
Repeat Last N | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | (unset) | Yes | number |
Repeat Penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) | (unset) | Yes | number |
Temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) | (unset) | Yes | number |
Seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) | (unset) | Yes | number |
Stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. | (unset) | Yes | string |
TFS Z | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) | (unset) | Yes | number |
Num Predict | Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context) | (unset) | Yes | number |
Top K | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) | (unset) | Yes | number |
Top P | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) | (unset) | Yes | number |
Additional Parameters | Additional parameters to pass to Ollama. Numbers will be parsed and sent as numbers, otherwise they will be sent as strings. See all supported parameters in Ollama | (none) | Yes | object |
Embedding models are models that are trained specifically to generate vector embeddings: long arrays of numbers that represent semantic meaning for a given sequence of text. The resulting vector embedding arrays can then be stored in a database, which will compare them as a way to search for data that is similar in meaning.
See Editor Settings for all possible inputs.
Title | Data Type | Description | Notes |
---|---|---|---|
Embedding | vector |
Array of numbers that represent semantic meaning for a given sequence of text. |
Setting | Description | Default Value | Use Input Toggle | Input Data Type |
---|---|---|---|---|
Model Name | The name of the model to get. | (Empty) | Yes (default off) | string |
Text | The text to embed. | (Empty) | Yes (default off) | string |
Previously the main node of the plugin. Allows you to send prompts to Ollama and receive responses from the LLMs installed with deep customization options even including custom prompt formats. Uses /api/generate route
Title | Data Type | Description | Default Value | Notes |
---|---|---|---|---|
System Prompt | string |
The system prompt to prepend to the messages list. | (none) | Optional. |
Messages | 'chat-message[]' | The chat messages to use as the prompt for the LLM. | (none) | Chat messages are converted to a prompt in Ollama based on the "Prompt Format" editor setting. If "Raw" is selected, no formatting is performed on the chat messages, and you are expected to have already formatted them in your Rivet graphs. |
Additional inputs available with toggles in the editor.
Title | Data Type | Description | Notes |
---|---|---|---|
Output | string |
The response text from the LLM. | |
Prompt | string |
The full prompt, with formatting, that was sent to Ollama. | |
Messages Sent | chat-message[] |
The messages that were sent to Ollama. | |
All Messages | chat-message[] |
All messages, including the reply from the LLM. | |
Total Duration | number |
Time spent generating the response. | Only available if the "Advanced Outputs" toggle is enabled. |
Load Duration | number |
Time spent in nanoseconds loading the model. | Only available if the "Advanced Outputs" toggle is enabled. |
Sample Count | number |
Number of samples generated. | Only available if the "Advanced Outputs" toggle is enabled. |
Sample Duration | number |
Time spent in nanoseconds generating samples. | Only available if the "Advanced Outputs" toggle is enabled. |
Prompt Eval Count | number |
Number of tokens in the prompt. | Only available if the "Advanced Outputs" toggle is enabled. |
Prompt Eval Duration | number |
Time spent in nanoseconds evaluating the prompt. | Only available if the "Advanced Outputs" toggle is enabled. |
Eval Count | number |
Number of tokens in the response. | Only available if the "Advanced Outputs" toggle is enabled. |
Eval Duration | number |
Time spent in nanoseconds evaluating the response. | Only available if the "Advanced Outputs" toggle is enabled. |
Tokens Per Second | number |
Number of tokens generated per second. | Only available if the "Advanced Outputs" toggle is enabled. |
Parameters | object |
The parameters used to generate the response. | Only available if the "Advanced Outputs" toggle is enabled. |
Setting | Description | Default Value | Use Input Toggle | Input Data Type |
---|---|---|---|---|
Model | The name of the LLM model in to use in Ollama. | (Empty) | Yes | string |
Prompt Format | The way to format chat messages for the prompt being sent to the ollama model. Raw means no formatting is applied. Llama 2 Instruct follows the Llama 2 prompt format. | Llama 2 Instruct | No | N/A |
JSON Mode | Activates JSON output mode | false | Yes | boolean |
Advanced Outputs | Add additional outputs with detailed information about the Ollama execution. | No | No | N/A |
Parameters Group | ||||
Mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | (unset) | Yes | number |
Mirostat Eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) | (unset) | Yes | number |
Mirostat Tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) | (unset) | Yes | number |
Num Ctx | Sets the size of the context window used to generate the next token. (Default: 2048) | (unset) | Yes | number |
Num GQA | The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b | (unset) | Yes | number |
Num GPUs | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. | (unset) | Yes | number |
Num Threads | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | (unset) | Yes | number |
Repeat Last N | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | (unset) | Yes | number |
Repeat Penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) | (unset) | Yes | number |
Temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) | (unset) | Yes | number |
Seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) | (unset) | Yes | number |
Stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. | (unset) | Yes | string |
TFS Z | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) | (unset) | Yes | number |
Num Predict | Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context) | (unset) | Yes | number |
Top K | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) | (unset) | Yes | number |
Top P | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) | (unset) | Yes | number |
Additional Parameters | Additional parameters to pass to Ollama. Numbers will be parsed and sent as numbers, otherwise they will be sent as strings. See all supported parameters in Ollama | (none) | Yes | object |
Lists the models installed in Ollama.
This node has no inputs.
Title | Data Type | Description | Notes |
---|---|---|---|
Model Names | string[] |
The names of the models installed in Ollama. |
This node has no editor settings.
Gets the model with the given name from Ollama.
See Editor Settings for all possible inputs.
Title | Data Type | Description | Notes |
---|---|---|---|
License | string |
Contents of the license block of the model. | |
Modelfile | string |
The Ollama modelfile for the model" | |
Parameters | string |
The parameters for the model. | |
Template | string |
The template for the model. |
Setting | Description | Default Value | Use Input Toggle | Input Data Type |
---|---|---|---|---|
Model Name | The name of the model to get. | (Empty) | Yes (default on) | string |
Downloads a model from the Ollama library to the Ollama server.
See Editor Settings for all possible inputs.
Title | Data Type | Description | Notes |
---|---|---|---|
Model Name | string |
The name of the model that was pulled. |
Setting | Description | Default Value | Use Input Toggle | Input Data Type |
---|---|---|---|---|
Model Name | The name of the model to pull. | (Empty) | Yes (default on) | string |
Insecure | Allow insecure connections to the library. Only use this if you are pulling from your own library during development. | No | No | N/A |
- Run
yarn dev
to start the compiler and bundler in watch mode. This will automatically recombine and rebundle your changes into thedist
folder. This will also copy the bundled files into the plugin install directory. - After each change, you must restart Rivet to see the changes.