Ollama

Let's setup a local LLMs service to run inference.

NOTE The LLM may use the GPU to improve performance, however the available resources may not be enough to complete all the requests.

Start ollama

See the ollama docs to start the service and obtain a model.

We will use phi3 for chat and nomic-embed-text for embeddings.

Download the models with

ollama pull phi3
ollama pull nomic-embed-text

Configure the SERMAS Toolkit API

Locate the file ./config/api/.env and add the following configurations

OLLAMA_BASEURL=http://172.17.0.1:11434

OLLAMA_MODEL=phi3
OLLAMA_CHAT_MODELS=phi3:*

OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

LLM_SERVICE=ollama
LLM_EMBEDDINGS_SERVICE=ollama

then restart the api with docker compose restart api

NOTE: The OLLAMA_BASEURL above may change based on your operating system and ollama installation.
The example above assumes a standard installation on a linux machine and uses 172.17.0.1 as the default gateway between the docker network and the host (host.docker.internal on MacOS and Windows) . localhost and 127.0.0.1 will typically NOT work.

IMPORTANT: Ollama must be listening to all local interfaces. By default a standard ollama installation will listen to localhost and 127.0.0.1, both unreachable from a docker container. To fix this, set OLLAMA_HOST=0.0.0.0. See here for more details

Configuring the application settings

In settings.yaml or app.yaml settings section add the following lines

llm:
  chat: ollama/phi3
  tools: ollama/phi3
  sentiment: ollama/phi3
  tasks: ollama/phi3
  intent: ollama/phi3
  translation: ollama/phi3

The pattern to follow is [provider]/[model]. The list of available models is visible in the kiosk UI, opening the left menu, under LLM settings

This allow to select the phi3 model for all the types of inference in the system. Depending on the setup, those could be changed to work with different providers and models configured.

Updating the application

Reimport the app from the CLI sermas-cli app save /apps/myapp

Reloading the page at http://localhost:8080 you can start using the configured ollama model.

Start ollama​

Configure the SERMAS Toolkit API​

Configuring the application settings​

Updating the application​

Start ollama

Configure the SERMAS Toolkit API

Configuring the application settings

Updating the application