User Tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
services:open-webui [2026/03/03 07:08] willyservices:open-webui [2026/03/03 08:56] (current) willy
Line 7: Line 7:
 both can be installed with a container, easily. both can be installed with a container, easily.
  
-A good NVIDIA GPU is strongly recommended, as inference will be significantly faster (up to 12 times in my experience) compared to just using CPU.+A good NVIDIA GPU is strongly recommended, as inference will be significantly faster (up to 12 times in my experience) compared to just using CPU. While using CPU only works, the user experience is pretty slow, not real time and you cannot really have any dialogue. If you don't have a GPU, go ahead, everything will still work just slower.
  
 Intel GPUs and AMD GPUs are supposed to be supported as well, but i have an NVIDIA GPU so this is what i will be describing in this page. Intel GPUs and AMD GPUs are supposed to be supported as well, but i have an NVIDIA GPU so this is what i will be describing in this page.
Line 32: Line 32:
 Adding the user to the **video** group is required for accessing GPU, both if using a container or not. Adding the user to the **video** group is required for accessing GPU, both if using a container or not.
  
-Open WebUI can be installed on bare metal, without containers, using //pip//, but due to strict python requirement (3.11 at the time of writing this), this is not recomended (Gentoo has already Python 3.13).+Open WebUI can be installed on bare metal, without containers, using //pip//, but due to strict python requirement (3.11 at the time of writing this), this is not recommended (Gentoo has already Python 3.13), and maintenance would be a pity since updates are almost daily.
  
-Let's go with the containers way, using of course Podman compose.+Let's go with the containers way, using of course rootless podman compose.
  
 From [[https://docs.openwebui.com/getting-started/quick-start/|this page]], select "docker compose" and  From [[https://docs.openwebui.com/getting-started/quick-start/|this page]], select "docker compose" and 
Line 56: Line 56:
     devices:     devices:
       - nvidia.com/gpu=all # required for GPU acceleration       - nvidia.com/gpu=all # required for GPU acceleration
 +    annotations:
 +      run.oci.keep_original_groups: "true" # required for GPU acceleration
     volumes:     volumes:
       - /data/llm/ollama/code:/code       - /data/llm/ollama/code:/code
Line 73: Line 75:
 </file> </file>
  
-this setup will pull in the same container setup both Ollama and Open WebUI. This allows for a seamless integration and neat organization in the server itself. +this setup will pull in the same container setup both Ollama and Open WebUI. This allows for a seamless integration and neat organization in the server itself.  
  
-This setup will let you access your Ollama instance from //outside// the container, on port 3081, which should **NOT** pe forwarded on the proxy server, because it's only for home access. The Open WebUI instance will instead be available on port 3080 and accessible trough web proxy, see below.+This setup will let you access your Ollama instance from //outside// the container, on port 3081, which should **NOT** be forwarded on the proxy server, because it's only for home access. The Open WebUI instance will instead be available on port 3080 and accessible trough web proxy, see below. You can still use Ollama on the server for other services, just do not export trough the proxy for external use, it would be unprotected.
  
- 
-===== Reverse Proxy ===== 
- 
-Open WebUI can be hosted on subdomain, let's assume you choose **ai.mydomain.com**. 
- 
-As usual you want it protected by the Reverse Proxy, so create the **ai.conf** file: 
-<file - ai.conf> 
-server { 
-        server_name ai.mydomain.com; 
-        listen 443 ssl; 
-        listen 8443 ssl; 
-        http2 on; 
- 
-        access_log /var/log/nginx/ai.mydomain.com_access_log main; 
-        error_log /var/log/nginx/ai.mydomain.com_error_log info; 
- 
-        location / { # The trailing / is important! 
-                proxy_pass        http://127.0.0.1:3080/; # The / is important! 
-                proxy_set_header  X-Script-Name /; 
-                proxy_set_header  Host $http_host; 
-                proxy_http_version 1.1; 
-                proxy_buffering off; 
-                proxy_set_header Upgrade $http_upgrade; 
-                proxy_set_header Connection $connection_upgrade; 
-                proxy_set_header X-Real-IP $remote_addr; 
-                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 
-                proxy_set_header X-Accel-Internal /internal-nginx-static-location; 
-                access_log off; 
-        } 
-        include com.mydomain/certbot.conf; 
-} 
-</file> 
-add this config file to NGINX (see [[selfhost:nginx|The Reverse Proxy concept]] for more details) and restart nginx. 
- 
-Now go with browser to **https://ai.mydomain.com** to finish setup. 
  
 ===== GPU acceleration support ===== ===== GPU acceleration support =====
  
-While you can run models using **only** your CPU, the end result will be a fairly annoying experience with very long response times. Basic interactions can take up to minutes (not kidding!) and having any kind of quick reply or even conversation can be frustrating. +=== Install NVIDIA drivers & tools ===
- +
-Luckly, you can improve up to a 10x factor (or more!) your response times by using a GPU. I have, on my server, an NVIDIA GA104GL [RTX A4000], which provides 16GB of VRAM and decent acceleration for the AI task. I didn't purchase this card on purpose, but i happened to have it from an existing gaming pc. +
- +
-To enable GPU acceleration, you need first to install drivers, then to allow the container user access to it. +
- +
-So, let's do it.  +
- +
-=== Install NVIDIA stuff ===+
  
 Enable NVIDIA card by adding this line: Enable NVIDIA card by adding this line:
Line 129: Line 88:
 VIDEO_CARDS="intel nvidia" VIDEO_CARDS="intel nvidia"
 </file> </file>
-(of course, put the cards you have, i hve both an Intel and an NVIDIA)+(of course, put the cards you have, i have both an Intel and an NVIDIA). This step is probably not needed on an headless server, but having it defined will ensure that in the future it could be used.
  
 Then disable the NVIDIA GUI tools, since the server is headless, put this into  **/etc/portage/package.use/nvidia**: Then disable the NVIDIA GUI tools, since the server is headless, put this into  **/etc/portage/package.use/nvidia**:
Line 140: Line 99:
 emerge -vp x11-drivers/nvidia-drivers app-containers/nvidia-container-toolkit emerge -vp x11-drivers/nvidia-drivers app-containers/nvidia-container-toolkit
 </code> </code>
 +
 +the **nvidia-drivers** is the actual driver, and **nvidia-container-toolkit** contains all the required files and stuff to enable passing the GPU to the container. More info can be found [[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html|here]].
  
 Now, check that the GPU is detected: Now, check that the GPU is detected:
Line 145: Line 106:
 nvidia-smi  nvidia-smi 
 Mon Mar  2 16:34:45 2026        Mon Mar  2 16:34:45 2026       
-[ ... lots of output with your GPU info ... ]+[ ... lots of output with your GPU info, VRAM, etc... ]
 </code> </code>
  
-Disable cgroups (won't work for rootless podman it seems) by editing the file /etc/nvidia-container-runtime/config.toml and set the property no-cgroups to true:+=== Configure NVIDIA tools === 
 + 
 +Disable cgroups (won't work for rootless podman) by editing the file /etc/nvidia-container-runtime/config.toml and set the property no-cgroups to true:
 <file> <file>
 [nvidia-container-cli] [nvidia-container-cli]
Line 155: Line 118:
 ... ...
 </file> </file>
-leave the rest of the file untouched+leave the rest of the file untouched.
  
-You need to generate a Common Device Interface (CDI) file which Podman will use to talk to the GPU:+You need to generate a Common Device Interface (CDI) file which Podman will use to talk to the GPU (see [[https://podman-desktop.io/docs/podman/gpu|here]]):
 <code bash> <code bash>
 nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Line 169: Line 132:
 INFO[0000] Found 3 CDI devices                           INFO[0000] Found 3 CDI devices                          
 nvidia.com/gpu=0 nvidia.com/gpu=0
-nvidia.com/gpu=GPU-45d0f042-fc48-a32f-ee51-e515f6db9551+nvidia.com/gpu=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx
 nvidia.com/gpu=all nvidia.com/gpu=all
 </code> </code>
  
-Cool! now, with the specific **device** block in the above docker-compose.yml, you should only need to restart the Openwebui container to get the GPU enabled. 
  
-**BUT**... there is a caveat! The nvidia dev endpoints are writable only to root and the **video** group, so add your user to that group:+=== Configure podman passtrough === 
 + 
 +To support GPU acceleration you need the two lines indicated in the compose file above 
 + 
 +This one: 
 +<code> 
 +    devices: 
 +      - nvidia.com/gpu=all # required for GPU acceleration 
 +</code> 
 +tells podman to pass all the GPUs to the container. You can actually select which one (if you have more than one) by selecting the appropriate one in the output of:
 <code bash> <code bash>
-usermod -G video openwebui+nvidia-ctk cdi list 
 +INFO[0000] Found 3 CDI devices                           
 +nvidia.com/gpu=0 
 +nvidia.com/gpu=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx 
 +nvidia.com/gpu=all
 </code> </code>
-and logout/login your teminal and restart the the container. 
  
-After restarting the container, this comman (as openwebui user) will tell you that all is well:+This line instead: 
 +<code> 
 +    annotations: 
 +      run.oci.keep_original_groups: "true" # required for GPU acceleration 
 +</code> 
 +is required because the container will forget the additional groups (of which **video** is required to access the GPU), and this annotation passes to the container the additional groups as well. 
 + 
 + 
 +=== Test GPU in container === 
 + 
 +After restarting the container, this commans (as openwebui user) will tell you that all is well:
 <code bash> <code bash>
 +su - openwebui
 podman exec -it ollama nvidia-smi podman exec -it ollama nvidia-smi
 [ ... output similar to above ... ] [ ... output similar to above ... ]
 </code> </code>
 +
 +
 +===== Reverse Proxy =====
 +
 +Open WebUI can be hosted on subdomain, let's assume you choose **ai.mydomain.com**.
 +
 +As usual you want it protected by the Reverse Proxy, so create the **ai.conf** file:
 +<file - ai.conf>
 +server {
 +        server_name ai.mydomain.com;
 +        listen 443 ssl;
 +        listen 8443 ssl;
 +        http2 on;
 +
 +        access_log /var/log/nginx/ai.mydomain.com_access_log main;
 +        error_log /var/log/nginx/ai.mydomain.com_error_log info;
 +
 +        location / { # The trailing / is important!
 +                proxy_pass        http://127.0.0.1:3080/; # The / is important!
 +                proxy_set_header  X-Script-Name /;
 +                proxy_set_header  Host $http_host;
 +                proxy_http_version 1.1;
 +                proxy_buffering off;
 +                proxy_set_header Upgrade $http_upgrade;
 +                proxy_set_header Connection $connection_upgrade;
 +                proxy_set_header X-Real-IP $remote_addr;
 +                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 +                proxy_set_header X-Accel-Internal /internal-nginx-static-location;
 +                access_log off;
 +        }
 +        include com.mydomain/certbot.conf;
 +}
 +</file>
 +add this config file to NGINX (see [[selfhost:nginx|The Reverse Proxy concept]] for more details) and restart nginx.
 +
 +Now go with browser to **https://ai.mydomain.com** to finish setup.
 +
  
 ===== Configuration ===== ===== Configuration =====