Open WebUI &amp; Ollama

This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.
====== Open WebUI & Ollama ======

[[https://github.com/open-webui/open-webui|Open WebUI]] is a web based frontend to LLMs models, and let you run your own private chatbot, or in general AI models.

[[https://ollama.com/|Ollama]] is a collection of open AI / LLM models that can be used withit Open WebUI.

both can be installed with a container, easily.

A good NVIDIA GPU is strongly recommended, as inference will be significantly faster (up to 12 times in my experience) compared to just using CPU. While using CPU only works, the user experience is pretty slow, not real time and you cannot really have any dialogue. If you don't have a GPU, go ahead, everything will still work just slower.

Intel GPUs and AMD GPUs are supposed to be supported as well, but i have an NVIDIA GPU so this is what i will be describing in this page.


===== Installation =====

To install Open WebUI, of course, you need it's dedicated user, and you will also need some persistent folders to map as volumes in the containers. I choose to put these folders under **/data/llm**.

so add user and create folders:
<code bash>
useradd -d /data/daemons/openwebui -m openwebui
usermod -G video openwebui
mkdir /data/llm
chown openwebui:openwebui /data/llm
su - openwebui
cd /data/llm
mkdir webui-data
mkdir ollama
mkdir ollama/code
mkdir ollama/ollama
</code>

Adding the user to the **video** group is required for accessing GPU, both if using a container or not.

Open WebUI can be installed on bare metal, without containers, using //pip//, but due to strict python requirement (3.11 at the time of writing this), this is not recommended (Gentoo has already Python 3.13), and maintenance would be a pity since updates are almost daily.

Let's go with the containers way, using of course rootless podman compose.

From [[https://docs.openwebui.com/getting-started/quick-start/|this page]], select "docker compose" and 

This is the compose file i am using, adapt to your needs:
<file - docker-compose.yaml>
services:
  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3080:8080"
    volumes:
      - /data/llm/webui-data:/app/backend/data
    networks:
      - openwebui-net

  ollama:
    image: docker.io/ollama/ollama:latest
    ports:
      - 3081:11434
    devices:
      - nvidia.com/gpu=all # required for GPU acceleration
    annotations:
      run.oci.keep_original_groups: "true" # required for GPU acceleration
    volumes:
      - /data/llm/ollama/code:/code
      - /data/llm/ollama/ollama:/root/.ollama
    container_name: ollama
#    pull_policy: always
    tty: true
    environment:
      - OLLAMA_KEEP_ALIVE=24h
      - OLLAMA_HOST=0.0.0.0
    networks:
      - openwebui-net

networks:
  openwebui-net:
    dns_enabled: true
</file>

this setup will pull in the same container setup both Ollama and Open WebUI. This allows for a seamless integration and neat organization in the server itself.  

This setup will let you access your Ollama instance from //outside// the container, on port 3081, which should **NOT** be forwarded on the proxy server, because it's only for home access. The Open WebUI instance will instead be available on port 3080 and accessible trough web proxy, see below. You can still use Ollama on the server for other services, just do not export trough the proxy for external use, it would be unprotected.


===== GPU acceleration support =====

=== Install NVIDIA drivers & tools ===

Enable NVIDIA card by adding this line:
<file - /etc/portage/make.conf>
VIDEO_CARDS="intel nvidia"
</file>
(of course, put the cards you have, i have both an Intel and an NVIDIA). This step is probably not needed on an headless server, but having it defined will ensure that in the future it could be used.

Then disable the NVIDIA GUI tools, since the server is headless, put this into  **/etc/portage/package.use/nvidia**:
<file - nvidia> 
x11-drivers/nvidia-drivers -tools
</file>

Now emerge the required packages:
<code bash>
emerge -vp x11-drivers/nvidia-drivers app-containers/nvidia-container-toolkit
</code>

the **nvidia-drivers** is the actual driver, and **nvidia-container-toolkit** contains all the required files and stuff to enable passing the GPU to the container. More info can be found [[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html|here]].

Now, check that the GPU is detected:
<code bash>
nvidia-smi 
Mon Mar  2 16:34:45 2026       
[ ... lots of output with your GPU info, VRAM, etc... ]
</code>

=== Configure NVIDIA tools ===

Disable cgroups (won't work for rootless podman) by editing the file /etc/nvidia-container-runtime/config.toml and set the property no-cgroups to true:
<file>
[nvidia-container-cli]
...
no-cgroups = true
...
</file>
leave the rest of the file untouched.

You need to generate a Common Device Interface (CDI) file which Podman will use to talk to the GPU (see [[https://podman-desktop.io/docs/podman/gpu|here]]):
<code bash>
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
</code>

you will need to **run again** the above command every time the NVIDIA drivers are updated.

At this point you should check the CDI is in place and working:
<code bash>
> nvidia-ctk cdi list
INFO[0000] Found 3 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx
nvidia.com/gpu=all
</code>


=== Configure podman passtrough ===

To support GPU acceleration you need the two lines indicated in the compose file above. 

This one:
<code>
    devices:
      - nvidia.com/gpu=all # required for GPU acceleration
</code>
tells podman to pass all the GPUs to the container. You can actually select which one (if you have more than one) by selecting the appropriate one in the output of:
<code bash>
nvidia-ctk cdi list
INFO[0000] Found 3 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx
nvidia.com/gpu=all
</code>

This line instead:
<code>
    annotations:
      run.oci.keep_original_groups: "true" # required for GPU acceleration
</code>
is required because the container will forget the additional groups (of which **video** is required to access the GPU), and this annotation passes to the container the additional groups as well.


=== Test GPU in container ===

After restarting the container, this commans (as openwebui user) will tell you that all is well:
<code bash>
su - openwebui
podman exec -it ollama nvidia-smi
[ ... output similar to above ... ]
</code>


===== Reverse Proxy =====

Open WebUI can be hosted on subdomain, let's assume you choose **ai.mydomain.com**.

As usual you want it protected by the Reverse Proxy, so create the **ai.conf** file:
<file - ai.conf>
server {
        server_name ai.mydomain.com;
        listen 443 ssl;
        listen 8443 ssl;
        http2 on;

        access_log /var/log/nginx/ai.mydomain.com_access_log main;
        error_log /var/log/nginx/ai.mydomain.com_error_log info;

        location / { # The trailing / is important!
                proxy_pass        http://127.0.0.1:3080/; # The / is important!
                proxy_set_header  X-Script-Name /;
                proxy_set_header  Host $http_host;
                proxy_http_version 1.1;
                proxy_buffering off;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection $connection_upgrade;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Accel-Internal /internal-nginx-static-location;
                access_log off;
        }
        include com.mydomain/certbot.conf;
}
</file>
add this config file to NGINX (see [[selfhost:nginx|The Reverse Proxy concept]] for more details) and restart nginx.

Now go with browser to **https://ai.mydomain.com** to finish setup.


===== Configuration =====

After you start the containers, be ready to wait a good ten minutes or more until the web gui is operative. YMMV of course, depending on your server capabilities.

You can find your Ollama public key under **data/daemons/openwebui/ollama/ollama/id_ed25519.pub**

To start using your own offline LLM:
  * Login to the Open WebUI page (ai.mydomain.com)
  * At first login, you will be prompted to create the admin user, do so.
  * Before chatting, you need to setup a model on Ollama
  * Go to //admin panel / settings / connections// 
  * under Ollama, edit it to the URL **http://ollama:11434**, and paste your Ollama key (see above)
  * Now tap on the small download-like icon on the right of the URL
  * You need to write a model name (ex: deepseek-r1) and download it
  * There will be no notification after download is finished, but under the //models// page in admin panel, the model(s) will be displayed

At this point, your LLM is ready and operative!


===== Autostart =====

To start it, and set it up on boot, as usual follow my indications [[gentoo:containers|Using Containers on Gentoo]], so link the **user-containers** init script:
<code>
ln -s /etc/init.d/user-containers /etc/init.d/user-containers.openwebui
</code>

and create the following config file:
<file - /etc/conf.d/user-containers.openwebui>
USER=openwebui
DESCRIPTION="Open web AI interface"
</file>

Add the service to the default runlevel and start it now:
<code bash>
rc-update add user-containers.openwebui default
rc-service user-containers.openwebui start
</code>
Willy's Wiki

User Tools