Skip to main content

Ollama

Let's setup a local LLMs service to run inference.

NOTE The LLM may use the GPU to improve performance, however the available resources may not be enough to complete all the requests.

Start ollama

See the ollama docs to start the service and obtain a model.

We will use phi3 for chat and nomic-embed-text for embeddings.

Download the models with

ollama pull phi3
ollama pull nomic-embed-text

Configure the SERMAS Toolkit API

Locate the file ./config/api/.env and add the following configurations

OLLAMA_BASEURL=http://172.17.0.1:11434

OLLAMA_MODEL=phi3
OLLAMA_CHAT_MODELS=phi3:*

OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

LLM_SERVICE=ollama
LLM_EMBEDDINGS_SERVICE=ollama

then restart the api with docker compose restart api

NOTE: The OLLAMA_BASEURL above may change based on your operating system and ollama installation.
The example above assumes a standard installation on a linux machine and uses 172.17.0.1 as the default gateway between the docker network and the host (host.docker.internal on MacOS and Windows) . localhost and 127.0.0.1 will typically NOT work.

IMPORTANT: Ollama must be listening to all local interfaces. By default a standard ollama installation will listen to localhost and 127.0.0.1, both unreachable from a docker container. To fix this, set OLLAMA_HOST=0.0.0.0. See here for more details

Configuring the application settings

In settings.yaml or app.yaml settings section add the following lines

llm:
chat: ollama/phi3
tools: ollama/phi3
sentiment: ollama/phi3
tasks: ollama/phi3
intent: ollama/phi3
translation: ollama/phi3

The pattern to follow is [provider]/[model]. The list of available models is visible in the kiosk UI, opening the left menu, under LLM settings

This allow to select the phi3 model for all the types of inference in the system. Depending on the setup, those could be changed to work with different providers and models configured.

Updating the application

Reimport the app from the CLI sermas-cli app save /apps/myapp

Reloading the page at http://localhost:8080 you can start using the configured ollama model.