Files
GPU-Recource-Manager-and-pr…/README.md
T
mrmarcus007 1e2567fd59 first commit
2025-11-10 16:02:57 +01:00

7.6 KiB
Raw Blame History

GPU Resource Manager & Proxy for Ollama

A smart GPU-aware proxy for Proxmox VE that dynamically manages GPU resources between Ollama and GPU-intensive background processes.


Overview

The GPU Resource Manager & Proxy for Ollama is a lightweight Python service that sits between your applications and the Ollama server. It intelligently supervises NVIDIA GPU usage on Proxmox VE hosts and ensures that GPU-intensive background tasks (e.g. mining) are temporarily suspended whenever Ollama requires GPU power.

This enables smooth coexistence of AI workloads and other GPU idle tasks on the same host.


Features

🔍 Intelligent GPU Monitoring

  • Detects GPU usage patterns and active processes via nvidia-smi.
  • Differentiates essential system processes from idle GPU workloads.

⚙️ Dynamic Resource Allocation

  • Automatically pauses idle/non-critical GPU processes when Ollama becomes active.
  • Automatically resumes them after configurable inactivity timeouts.

🗓️ Scheduled Blackout Window

  • By Default Automatically stops idle GPU processes between 2:15 AM 3:30 AM for maintenance.

🖥️ Proxmox LXC Integration

  • Direct container control using Proxmox's pct command.
  • Ideal for GPU-passthrough LXC containers (miners, renderers, etc.).

Real-Time Process Detection

  • Inspects NVIDIA GPU processes continuously.
  • Supports customizable allow-lists and idle-process lists.

📦 Requirements

  • NVIDIA GPU + drivers
  • nvidia-smi
  • Python 3.x
  • python3-requests
  • Proxmox VE host
  • At least one GPU-passthrough LXC container
  • Optional: Ollama server running inside LXC

Install required packages:

sudo apt update
sudo apt install python3 python3-requests

⚙️ Configuration

These are the primary configuration variables inside the script:

# Basic Configuration
OLLAMA_HOST = "localhost"      # Ollama container IP
OLLAMA_PORT = 11434            # Ollama API port
PROXY_PORT = 11435             # Proxy server port
GPU_CHECK_INTERVAL = 10        # Seconds between GPU checks

# GPU Process Management
IDLE_NvGPU_PROCESSES = ['t-rex', 'trex', 'miner', 'xmrig', 'lolminer', 'nbminer']
KNOWN_NvidiaGPU_PROCESSES = ['Xorg']  
IDLE_CONTAINER_ID = "120"      # LXC container ID of idle GPU workload
Blackout_schedule_Start = 2, 15 #when to start stopping the idle NvGPU container. Hour, Minute.
Blackout_schedule_End = 3, 30 #when to allow starting the idle NvGPU container again. Hour, Minute.


🛠️ Installation (systemd service)

Create the service file:

/etc/systemd/system/gpu-proxy.service

[Unit]
Description=GPU Resource Manager and Proxy for Ollama
After=network.target

[Service]
Type=simple
User=root
ExecStart=/usr/bin/python3 /usr/local/bin/gpu-proxy.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable gpu-proxy
sudo systemctl start gpu-proxy

🚀 Usage

Forward requests to the proxy port instead of directly to Ollama.

Example:

curl http://proxmox-host:11435/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

API Endpoints Automatically Managed

  • /api/generate
  • /api/chat
  • /api/embeddings
  • /api/load
  • /api/pull

Anything GPU-intensive triggers resource management logic.


🔧 How It Works

🔄 Resource Management Flow

  1. Request received by proxy
  2. Proxy detects GPU-intensive Ollama endpoint
  3. Proxy checks for non-idle GPU processes / idle container
  4. If necessary → stops idle GPU container
  5. Forwards request to Ollama
  6. Waits for Ollama to Finish (default 120s timeout)
  7. Watches GPU activity for non-idle processes
  8. Once GPU is idle → starts idle container

📜 Logging

Logs stored in:

/var/log/gpu_proxy.log

Examples:

2025-11-10 15:36:41,882 - INFO - Starting GPU Resource Manager And Proxy for ollama.
2025-11-10 15:36:42,695 - INFO - pct command available (this is the host)
2025-11-10 15:36:42,717 - INFO - Current GPU processes: [{'pid': '2381690', 'name': '/var/lib/cudo-miner/registry/aaf375fd4c7b39548121985bce1e7b64/t-rex', 'memory': '5478 MiB'}]
2025-11-10 15:36:43,524 - INFO - Idle NvGPU container 120 running: True
2025-11-10 15:36:43,525 - INFO - GPU monitoring thread started
2025-11-10 15:36:43,525 - INFO - Proxy server running on port 11435
2025-11-10 15:36:43,525 - INFO - Forwarding to Ollama at localhost:11434
2025-11-10 15:36:43,525 - INFO - Managing idle NvGPU container: 120
2025-11-10 15:36:43,525 - INFO - Monitoring GPU usage and scheduled maintenance windows
2025-11-10 15:36:43,525 - INFO - Idle NvGPU process patterns: ['t-rex', 'trex', 'miner', 'xmrig', 'lolminer', 'nbminer']
2025-11-10 15:36:55,310 - INFO - Localhost - "GET /api/tags HTTP/1.1" 200 -
2025-11-10 15:36:55,332 - INFO - Localhost - "GET /api/ps HTTP/1.1" 200 -
2025-11-10 15:37:01,223 - INFO - GPU-intensive operation detected: /api/chat
2025-11-10 15:37:02,040 - INFO - Force stopping idle NvGPU container for Ollama GPU operation
2025-11-10 15:37:02,859 - INFO - Stopping container 120
2025-11-10 15:37:06,391 - INFO - Container 120 stopped successfully
2025-11-10 15:37:29,546 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
2025-11-10 15:37:29,551 - INFO - Ollama request completed, activity timestamp updated
2025-11-10 15:37:29,664 - INFO - GPU-intensive operation detected: /api/chat
2025-11-10 15:37:39,896 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
2025-11-10 15:37:39,896 - INFO - Ollama request completed, activity timestamp updated
2025-11-10 15:37:39,903 - INFO - GPU-intensive operation detected: /api/chat
2025-11-10 15:38:17,616 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
2025-11-10 15:38:17,616 - INFO - Ollama request completed, activity timestamp updated
2025-11-10 15:38:17,631 - INFO - GPU-intensive operation detected: /api/chat
2025-11-10 15:38:53,068 - INFO - Localhost - "POST /api/chat HTTP/1.1" 200 -
2025-11-10 15:38:53,069 - INFO - Ollama request completed, activity timestamp updated
2025-11-10 15:39:59,855 - INFO - Ollama activity timeout reached
2025-11-10 15:43:58,186 - INFO - GPU idle, starting idle NvGPU container
2025-11-10 15:43:59,001 - INFO - Starting container 120
2025-11-10 15:44:02,739 - INFO - Container 120 started successfully

Enable debug mode:

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/var/log/gpu_proxy.log'),
        logging.StreamHandler()
    ]
)

🧪 Monitoring & Testing

Check service:

systemctl status gpu-proxy

Tail logs:

tail -f /var/log/gpu_proxy.log

Test GPU processes:

nvidia-smi --query-compute-apps=pid,process_name,used_memory   --format=csv,noheader,nounits

Troubleshooting

pct command not found

→ Script must run on the Proxmox host, not inside an LXC.

GPU processes not detected

  • Verify NVIDIA drivers
  • Run nvidia-smi manually
  • Ensure GPU passthrough is configured

Idle container not managed

  • Check the LXC exists
  • Run pct list
  • Ensure root permissions

Proxy connection refused

  • Ensure Ollama is running
  • Check firewall rules
  • Check via curl inside Proxmox:
curl http://<OLLAMA_HOST>:11434/api/version

🤝 Contributing

Pull requests and issues are welcome!
If youd like to contribute, please open an issue first to discuss your idea.


📄 License

This project is licensed under the MIT License. See the LICENSE file for details.