first commit

2025-11-10 16:02:57 +01:00
commit 1e2567fd59
2 changed files with 645 additions and 0 deletions
@@ -0,0 +1,268 @@
+# GPU Resource Manager & Proxy for Ollama  
+*A smart GPU-aware proxy for Proxmox VE that dynamically manages GPU resources between Ollama and GPU-intensive background processes.*
+
+---
+
+## ✅ Overview
+
+The **GPU Resource Manager & Proxy for Ollama** is a lightweight Python service that sits between your applications and the Ollama server. It intelligently supervises NVIDIA GPU usage on **Proxmox VE** hosts and ensures that GPU-intensive background tasks (e.g. mining) are temporarily suspended whenever Ollama requires GPU power.
+
+This enables smooth coexistence of AI workloads and other GPU idle tasks on the same host.
+
+---
+
+## ✨ Features
+
+### 🔍 Intelligent GPU Monitoring  
+- Detects GPU usage patterns and active processes via `nvidia-smi`.
+- Differentiates essential system processes from idle GPU workloads.
+
+### ⚙️ Dynamic Resource Allocation  
+- Automatically pauses idle/non-critical GPU processes when Ollama becomes active.  
+- Automatically resumes them after configurable inactivity timeouts.
+
+### 🗓️ Scheduled Blackout Window  
+- By Default Automatically stops idle GPU processes between **2:15 AM – 3:30 AM** for maintenance.
+
+### 🖥️ Proxmox LXC Integration  
+- Direct container control using Proxmox's `pct` command.  
+- Ideal for GPU-passthrough LXC containers (miners, renderers, etc.).
+
+### ⚡ Real-Time Process Detection  
+- Inspects NVIDIA GPU processes continuously.  
+- Supports customizable allow-lists and idle-process lists.
+
+---
+
+## 📦 Requirements
+
+- NVIDIA GPU + drivers  
+- `nvidia-smi`  
+- Python **3.x**  
+- `python3-requests`  
+- Proxmox VE host  
+- At least one GPU-passthrough LXC container  
+- Optional: Ollama server running inside LXC
+
+Install required packages:
+
+```bash
+sudo apt update
+sudo apt install python3 python3-requests
+```
+
+---
+
+## ⚙️ Configuration
+
+These are the primary configuration variables inside the script:
+
+```python
+# Basic Configuration
+OLLAMA_HOST = "localhost"      # Ollama container IP
+OLLAMA_PORT = 11434            # Ollama API port
+PROXY_PORT = 11435             # Proxy server port
+GPU_CHECK_INTERVAL = 10        # Seconds between GPU checks
+
+# GPU Process Management
+IDLE_NvGPU_PROCESSES = ['t-rex', 'trex', 'miner', 'xmrig', 'lolminer', 'nbminer']
+KNOWN_NvidiaGPU_PROCESSES = ['Xorg']  
+IDLE_CONTAINER_ID = "120"      # LXC container ID of idle GPU workload
+Blackout_schedule_Start = 2, 15 #when to start stopping the idle NvGPU container. Hour, Minute.
+Blackout_schedule_End = 3, 30 #when to allow starting the idle NvGPU container again. Hour, Minute.
+
+```
+
+---
+
+## 🛠️ Installation (systemd service)
+
+Create the service file:
+
+`/etc/systemd/system/gpu-proxy.service`
+
+```ini
+[Unit]
+Description=GPU Resource Manager and Proxy for Ollama
+After=network.target
+
+[Service]
+Type=simple
+User=root
+ExecStart=/usr/bin/python3 /usr/local/bin/gpu-proxy.py
+Restart=always
+RestartSec=5
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start the service:
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable gpu-proxy
+sudo systemctl start gpu-proxy
+```
+
+---
+
+## 🚀 Usage
+
+Forward requests to the **proxy port** instead of directly to Ollama.
+
+Example:
+
+```bash
+curl http://proxmox-host:11435/api/generate -d '{
+  "model": "llama2",
+  "prompt": "Why is the sky blue?",
+  "stream": false
+}'
+```
+
+### API Endpoints Automatically Managed
+- `/api/generate`
+- `/api/chat`
+- `/api/embeddings`
+- `/api/load`
+- `/api/pull`
+
+Anything GPU-intensive triggers resource management logic.
+
+---
+
+## 🔧 How It Works
+
+### 🔄 Resource Management Flow
+
+1. **Request received** by proxy  
+2. Proxy detects GPU-intensive Ollama endpoint  
+3. Proxy checks for non-idle GPU processes / idle container  
+4. If necessary → **stops idle GPU container**  
+5. Forwards request to Ollama
+6. Waits for Ollama to Finish (default 120s timeout)
+7. Watches GPU activity for non-idle processes
+8. Once GPU is idle → **starts idle container**
+
+---
+
+## 📜 Logging
+
+Logs stored in:
+
+```
+/var/log/gpu_proxy.log
+```
+
+Examples:
+
+```
+2025-11-10 15:36:41,882 - INFO - Starting GPU Resource Manager And Proxy for ollama.
+2025-11-10 15:36:42,695 - INFO - pct command available (this is the host)
+2025-11-10 15:36:42,717 - INFO - Current GPU processes: [{'pid': '2381690', 'name': '/var/lib/cudo-miner/registry/aaf375fd4c7b39548121985bce1e7b64/t-rex', 'memory': '5478 MiB'}]
+2025-11-10 15:36:43,524 - INFO - Idle NvGPU container 120 running: True
+2025-11-10 15:36:43,525 - INFO - GPU monitoring thread started
+2025-11-10 15:36:43,525 - INFO - Proxy server running on port 11435
+2025-11-10 15:36:43,525 - INFO - Forwarding to Ollama at localhost:11434
+2025-11-10 15:36:43,525 - INFO - Managing idle NvGPU container: 120
+2025-11-10 15:36:43,525 - INFO - Monitoring GPU usage and scheduled maintenance windows
+2025-11-10 15:36:43,525 - INFO - Idle NvGPU process patterns: ['t-rex', 'trex', 'miner', 'xmrig', 'lolminer', 'nbminer']
+2025-11-10 15:36:55,310 - INFO - Localhost - "GET /api/tags HTTP/1.1" 200 -
+2025-11-10 15:36:55,332 - INFO - Localhost - "GET /api/ps HTTP/1.1" 200 -
+2025-11-10 15:37:01,223 - INFO - GPU-intensive operation detected: /api/chat
+2025-11-10 15:37:02,040 - INFO - Force stopping idle NvGPU container for Ollama GPU operation
+2025-11-10 15:37:02,859 - INFO - Stopping container 120
+2025-11-10 15:37:06,391 - INFO - Container 120 stopped successfully
+2025-11-10 15:37:29,546 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
+2025-11-10 15:37:29,551 - INFO - Ollama request completed, activity timestamp updated
+2025-11-10 15:37:29,664 - INFO - GPU-intensive operation detected: /api/chat
+2025-11-10 15:37:39,896 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
+2025-11-10 15:37:39,896 - INFO - Ollama request completed, activity timestamp updated
+2025-11-10 15:37:39,903 - INFO - GPU-intensive operation detected: /api/chat
+2025-11-10 15:38:17,616 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
+2025-11-10 15:38:17,616 - INFO - Ollama request completed, activity timestamp updated
+2025-11-10 15:38:17,631 - INFO - GPU-intensive operation detected: /api/chat
+2025-11-10 15:38:53,068 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
+2025-11-10 15:38:53,069 - INFO - Ollama request completed, activity timestamp updated
+2025-11-10 15:39:59,855 - INFO - Ollama activity timeout reached
+2025-11-10 15:43:58,186 - INFO - GPU idle, starting idle NvGPU container
+2025-11-10 15:43:59,001 - INFO - Starting container 120
+2025-11-10 15:44:02,739 - INFO - Container 120 started successfully
+```
+
+Enable debug mode:
+
+```python
+logging.basicConfig(
+    level=logging.DEBUG,
+    format='%(asctime)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.FileHandler('/var/log/gpu_proxy.log'),
+        logging.StreamHandler()
+    ]
+)
+```
+
+---
+
+## 🧪 Monitoring & Testing
+
+Check service:
+
+```bash
+systemctl status gpu-proxy
+```
+
+Tail logs:
+
+```bash
+tail -f /var/log/gpu_proxy.log
+```
+
+Test GPU processes:
+
+```bash
+nvidia-smi --query-compute-apps=pid,process_name,used_memory   --format=csv,noheader,nounits
+```
+
+---
+
+## ❗ Troubleshooting
+
+### `pct` command not found  
+→ Script must run **on the Proxmox host**, not inside an LXC.
+
+### GPU processes not detected  
+- Verify NVIDIA drivers  
+- Run `nvidia-smi` manually  
+- Ensure GPU passthrough is configured
+
+### Idle container not managed  
+- Check the LXC exists  
+- Run `pct list`  
+- Ensure root permissions
+
+### Proxy connection refused  
+- Ensure Ollama is running  
+- Check firewall rules  
+- Check via curl inside Proxmox:
+
+```bash
+curl http://<OLLAMA_HOST>:11434/api/version
+```
+
+---
+
+## 🤝 Contributing
+
+Pull requests and issues are welcome!  
+If you’d like to contribute, please open an issue first to discuss your idea.
+
+---
+
+## 📄 License
+
+This project is licensed under the **MIT License**. See the `LICENSE` file for details.
+
+---