GPU Hot Real-Time NVIDIA GPU Monitoring Dashboard Single-container web dashboard for NVIDIA GPU monitoring with real-time charts. Overview Self-contained dashboard for monitoring NVIDIA GPUs on remote servers. Access utilization and health metrics from a browser without SSH. Runs in a single container on one port. No configuration required - start the container and open a browser. Quick Start docker-compose up --build Open http://localhost:1312 Requirements: Docker, NVIDIA Container Toolkit (install guide) Why Not Just Use... nvidia-smi CLI: Requires SSH access No historical data or charts Manual refresh only Hard to compare multiple GPUs prometheus/grafana: Complex setup (exporters, databases, dashboard configs) Overkill for simple monitoring needs Higher resource usage This is the middle ground: web interface with charts, zero configuration. Features 7 Charts per GPU: Utilization, Temperature, Memory, Power Draw Fan Speed, Clock Speeds (graphics/SM/memory), Power Efficiency Monitoring: Automatic multi-GPU detection GPU process tracking (PID, memory usage) System CPU/RAM monitoring Threshold indicators (temp: 75°C/85°C, util: 80%, memory: 90%) Metrics Collected: Core Metrics GPU & Memory Utilization (%) Temperature - GPU core & memory (°C) Memory - used/free/total (MB) Power - draw & limits (W) Fan Speed (%) Clock Speeds - graphics, SM, memory, video (MHz) Advanced Metrics PCIe Generation & Lane Width (current/max) Performance State (P-State) Compute Mode Encoder/Decoder sessions & statistics Driver & VBIOS versions Throttle status Installation Docker (Recommended) git clone https://github.com/psalias2006/gpu-hot cd gpu-hot docker-compose up --build Local Development pip install -r requirements.txt python app.py Verify GPU Access docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi If this fails, install NVIDIA Container Toolkit first. Configuration None required. Optional customization: Environment Variables: NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs (default: all) Application ( app.py ): eventlet . sleep ( 2 ) # Update interval (seconds) socketio . run ( app , port = 1312 ) # Port Charts ( static/js/charts.js ): if ( data . labels . length > 30 ) // History length (data points) API HTTP GET / # Dashboard UI GET /api/gpu-data # JSON metrics WebSocket socket . on ( 'gpu_data' , ( data ) => { // Real-time updates every 2s // data.gpus, data.processes, data.system } ) ; Extending Add New Metric 1. Backend ( app.py ): def parse_nvidia_smi ( self ): result = subprocess . run ([ 'nvidia-smi' , '--query-gpu=index,name,your.new.metric' , '--format=csv,noheader,nounits' ], ...) 2. Frontend ( static/js/gpu-cards.js ): // Add to createGPUCard() template < div class = "metric-value" id = "new-metric-${gpuId}" > $ { gpuInfo . new_metric } 3. Chart (optional static/js/charts.js ): chartConfigs . newMetric = { type : 'line' , data : { ... } , options : { ... } } ; Project Structure gpu-hot/ ├── app.py # Flask + WebSocket server ├── static/js/ │ ├── charts.js # Chart configuration │ ├── gpu-cards.js # UI rendering │ ├── socket-handlers.js # WebSocket events │ ├── ui.js # View switching │ └── app.js # Bootstrap ├── templates/index.html # Dashboard ├── Dockerfile # nvidia/cuda:12.1-devel-ubuntu22.04 └── docker-compose.yml Troubleshooting GPU not detected: # Verify drivers nvidia-smi # Test Docker GPU access docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi # Restart Docker daemon sudo systemctl restart docker Debug logging: # app.py socketio . run ( app , debug = True ) Contributing Pull requests welcome. For major changes, open an issue first. git checkout -b feature/NewFeature git commit -m ' Add NewFeature ' git push origin feature/NewFeature License MIT - see LICENSE