GPU Hot Real-Time NVIDIA GPU Monitoring Dashboard Single-container web dashboard for NVIDIA GPU monitoring with real-time charts.
Overview
Self-contained dashboard for monitoring NVIDIA GPUs on remote servers. Access utilization and health metrics from a browser without SSH.
Runs in a single container on one port. No configuration required - start the container and open a browser.
Quick Start
docker-compose up --build
Open http://localhost:1312
Requirements: Docker, NVIDIA Container Toolkit (install guide)
Why Not Just Use...
nvidia-smi CLI:
Requires SSH access
No historical data or charts
Manual refresh only
Hard to compare multiple GPUs
prometheus/grafana:
Complex setup (exporters, databases, dashboard configs)
Overkill for simple monitoring needs
Higher resource usage
This is the middle ground: web interface with charts, zero configuration.
Features
7 Charts per GPU:
Utilization, Temperature, Memory, Power Draw
Fan Speed, Clock Speeds (graphics/SM/memory), Power Efficiency
Monitoring:
Automatic multi-GPU detection
GPU process tracking (PID, memory usage)
System CPU/RAM monitoring
Threshold indicators (temp: 75°C/85°C, util: 80%, memory: 90%)
Metrics Collected:
Core Metrics GPU & Memory Utilization (%)
Temperature - GPU core & memory (°C)
Memory - used/free/total (MB)
Power - draw & limits (W)
Fan Speed (%)
Clock Speeds - graphics, SM, memory, video (MHz)
Advanced Metrics PCIe Generation & Lane Width (current/max)
Performance State (P-State)
Compute Mode
Encoder/Decoder sessions & statistics
Driver & VBIOS versions
Throttle status
Installation
Docker (Recommended)
git clone https://github.com/psalias2006/gpu-hot cd gpu-hot docker-compose up --build
Local Development
pip install -r requirements.txt python app.py
Verify GPU Access
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
If this fails, install NVIDIA Container Toolkit first.
Configuration
None required. Optional customization:
Environment Variables:
NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs (default: all)
Application ( app.py ):
eventlet . sleep ( 2 ) # Update interval (seconds) socketio . run ( app , port = 1312 ) # Port
Charts ( static/js/charts.js ):
if ( data . labels . length > 30 ) // History length (data points)
API
HTTP
GET / # Dashboard UI GET /api/gpu-data # JSON metrics
WebSocket
socket . on ( 'gpu_data' , ( data ) => { // Real-time updates every 2s // data.gpus, data.processes, data.system } ) ;
Extending
Add New Metric
1. Backend ( app.py ):
def parse_nvidia_smi ( self ): result = subprocess . run ([ 'nvidia-smi' , '--query-gpu=index,name,your.new.metric' , '--format=csv,noheader,nounits' ], ...)
2. Frontend ( static/js/gpu-cards.js ):
// Add to createGPUCard() template < div class = "metric-value" id = "new-metric-${gpuId}" > $ { gpuInfo . new_metric } div >
3. Chart (optional static/js/charts.js ):
chartConfigs . newMetric = { type : 'line' , data : { ... } , options : { ... } } ;
Project Structure
gpu-hot/ ├── app.py # Flask + WebSocket server ├── static/js/ │ ├── charts.js # Chart configuration │ ├── gpu-cards.js # UI rendering │ ├── socket-handlers.js # WebSocket events │ ├── ui.js # View switching │ └── app.js # Bootstrap ├── templates/index.html # Dashboard ├── Dockerfile # nvidia/cuda:12.1-devel-ubuntu22.04 └── docker-compose.yml
Troubleshooting
GPU not detected:
# Verify drivers nvidia-smi # Test Docker GPU access docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi # Restart Docker daemon sudo systemctl restart docker
Debug logging:
# app.py socketio . run ( app , debug = True )
Contributing
Pull requests welcome. For major changes, open an issue first.
git checkout -b feature/NewFeature git commit -m ' Add NewFeature ' git push origin feature/NewFeature
License
MIT - see LICENSE