This is an old revision of the document!

Open WebUI & Ollama

Open WebUI is a web based frontend to LLMs models, and let you run your own private chatbot, or in general AI models.

Ollama is a collection of open AI / LLM models that can be used withit Open WebUI.

both can be installed with a container, easily.

A good NVIDIA GPU is strongly recommended, as inference will be significantly faster (up to 12 times in my experience) compared to just using CPU.

Intel GPUs and AMD GPUs are supposed to be supported as well, but i have an NVIDIA GPU so this is what i will be describing in this page.

Installation

To install Open WebUI, of course, you need it's dedicated user, and you will also need some persistent folders to map as volumes in the containers. I choose to put these folders under /data/llm.

so add user and create folders:

useradd -d /data/daemons/openwebui -m openwebui
usermod -G video openwebui
mkdir /data/llm
chown openwebui:openwebui /data/llm
su - openwebui
cd /data/llm
mkdir webui-data
mkdir ollama
mkdir ollama/code
mkdir ollama/ollama

Adding the user to the video group is required for accessing GPU, both if using a container or not.

Open WebUI can be installed on bare metal, without containers, using pip, but due to strict python requirement (3.11 at the time of writing this), this is not recomended (Gentoo has already Python 3.13).

Let's go with the containers way, using of course Podman compose.

From this page, select “docker compose” and

This is the compose file i am using, adapt to your needs:

docker-compose.yaml

services:
  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3080:8080"
    volumes:
      - /data/llm/webui-data:/app/backend/data
    networks:
      - openwebui-net

  ollama:
    image: docker.io/ollama/ollama:latest
    ports:
      - 3081:11434
    devices:
      - nvidia.com/gpu=all # required for GPU acceleration
    volumes:
      - /data/llm/ollama/code:/code
      - /data/llm/ollama/ollama:/root/.ollama
    container_name: ollama
#    pull_policy: always
    tty: true
    environment:
      - OLLAMA_KEEP_ALIVE=24h
      - OLLAMA_HOST=0.0.0.0
    networks:
      - openwebui-net

networks:
  openwebui-net:
    dns_enabled: true

this setup will pull in the same container setup both Ollama and Open WebUI. This allows for a seamless integration and neat organization in the server itself.

This setup will let you access your Ollama instance from outside the container, on port 3081, which should NOT pe forwarded on the proxy server, because it's only for home access. The Open WebUI instance will instead be available on port 3080 and accessible trough web proxy, see below.

Reverse Proxy

Open WebUI can be hosted on subdomain, let's assume you choose ai.mydomain.com.

As usual you want it protected by the Reverse Proxy, so create the ai.conf file:

ai.conf

server {
        server_name ai.mydomain.com;
        listen 443 ssl;
        listen 8443 ssl;
        http2 on;

        access_log /var/log/nginx/ai.mydomain.com_access_log main;
        error_log /var/log/nginx/ai.mydomain.com_error_log info;

        location / { # The trailing / is important!
                proxy_pass        http://127.0.0.1:3080/; # The / is important!
                proxy_set_header  X-Script-Name /;
                proxy_set_header  Host $http_host;
                proxy_http_version 1.1;
                proxy_buffering off;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection $connection_upgrade;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Accel-Internal /internal-nginx-static-location;
                access_log off;
        }
        include com.mydomain/certbot.conf;
}

add this config file to NGINX (see The Reverse Proxy concept for more details) and restart nginx.

Now go with browser to https://ai.mydomain.com to finish setup.

GPU acceleration support

While you can run models using only your CPU, the end result will be a fairly annoying experience with very long response times. Basic interactions can take up to minutes (not kidding!) and having any kind of quick reply or even conversation can be frustrating.

Luckly, you can improve up to a 10x factor (or more!) your response times by using a GPU. I have, on my server, an NVIDIA GA104GL [RTX A4000], which provides 16GB of VRAM and decent acceleration for the AI task. I didn't purchase this card on purpose, but i happened to have it from an existing gaming pc.

To enable GPU acceleration, you need first to install drivers, then to allow the container user access to it.

So, let's do it.

Install NVIDIA stuff

Enable NVIDIA card by adding this line:

/etc/portage/make.conf

VIDEO_CARDS="intel nvidia"

(of course, put the cards you have, i hve both an Intel and an NVIDIA)

Then disable the NVIDIA GUI tools, since the server is headless, put this into /etc/portage/package.use/nvidia:

nvidia

 
x11-drivers/nvidia-drivers -tools

Now emerge the required packages:

emerge -vp x11-drivers/nvidia-drivers app-containers/nvidia-container-toolkit

Now, check that the GPU is detected:

nvidia-smi 
Mon Mar  2 16:34:45 2026       
[ ... lots of output with your GPU info ... ]

Disable cgroups (won't work for rootless podman it seems) by editing the file /etc/nvidia-container-runtime/config.toml and set the property no-cgroups to true:

[nvidia-container-cli]
...
no-cgroups = true
...

leave the rest of the file untouched

You need to generate a Common Device Interface (CDI) file which Podman will use to talk to the GPU:

nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

you will need to run again the above command every time the NVIDIA drivers are updated.

At this point you should check the CDI is in place and working:

> nvidia-ctk cdi list
INFO[0000] Found 3 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=GPU-45d0f042-fc48-a32f-ee51-e515f6db9551
nvidia.com/gpu=all

Cool! now, with the specific device block in the above docker-compose.yml, you should only need to restart the Openwebui container to get the GPU enabled.

BUT… there is a caveat! The nvidia dev endpoints are writable only to root and the video group, so add your user to that group:

usermod -G video openwebui

and logout/login your teminal and restart the the container.

After restarting the container, this comman (as openwebui user) will tell you that all is well:

podman exec -it ollama nvidia-smi
[ ... output similar to above ... ]

Configuration

After you start the containers, be ready to wait a good ten minutes or more until the web gui is operative. YMMV of course, depending on your server capabilities.

You can find your Ollama public key under data/daemons/openwebui/ollama/ollama/id_ed25519.pub

To start using your own offline LLM:

Login to the Open WebUI page (ai.mydomain.com)
At first login, you will be prompted to create the admin user, do so.
Before chatting, you need to setup a model on Ollama
Go to admin panel / settings / connections
under Ollama, edit it to the URL http://ollama:11434, and paste your Ollama key (see above)
Now tap on the small download-like icon on the right of the URL
You need to write a model name (ex: deepseek-r1) and download it
There will be no notification after download is finished, but under the models page in admin panel, the model(s) will be displayed

At this point, your LLM is ready and operative!

Autostart

To start it, and set it up on boot, as usual follow my indications Using Containers on Gentoo, so link the user-containers init script:

ln -s /etc/init.d/user-containers /etc/init.d/user-containers.openwebui

and create the following config file:

/etc/conf.d/user-containers.openwebui

USER=openwebui
DESCRIPTION="Open web AI interface"

Add the service to the default runlevel and start it now:

rc-update add user-containers.openwebui default
rc-service user-containers.openwebui start

Willy's Wiki

User Tools