GPU Kill
A CLI tool for managing GPUs across NVIDIA, AMD, Intel, and Apple Silicon systems. Monitor, control, and secure your GPU infrastructure with ease.
Community & Support
Join our Discord community for discussions, support, and updates:
Features
Monitor GPUs : Real-time usage, memory, temperature, and processes
: Real-time usage, memory, temperature, and processes Kill Processes : Gracefully terminate stuck GPU processes
: Gracefully terminate stuck GPU processes Security : Detect crypto miners and suspicious activity
: Detect crypto miners and suspicious activity Guard Mode : Policy enforcement to prevent resource abuse
: Policy enforcement to prevent resource abuse Dashboard : Web interface for cluster monitoring
: Web interface for cluster monitoring Remote : Manage GPUs across multiple servers
: Manage GPUs across multiple servers Multi-Vendor : Works with NVIDIA, AMD, Intel, and Apple Silicon
: Works with NVIDIA, AMD, Intel, and Apple Silicon AI Integration: MCP server for AI assistant integration
Requirements
Build Performance
For faster development builds:
# Fast release build (recommended for development) cargo build --profile release-fast # Standard release build (optimized for production) cargo build --release # Maximum optimization (slowest, best performance) cargo build --profile release-max
Build times on typical hardware:
Debug build: ~3 seconds
Release-fast: ~28 seconds
Release: ~28 seconds (improved from 76 seconds)
Release-max: ~60+ seconds (maximum optimization)
System Dependencies
Linux (Ubuntu/Debian):
sudo apt install build-essential libssl-dev pkg-config
Linux (Fedora/RHEL/CentOS):
sudo dnf install gcc gcc-c++ pkg-config openssl-devel # or for older systems: # sudo yum install gcc gcc-c++ pkg-config openssl-devel
macOS:
# Install Xcode command line tools xcode-select --install # OpenSSL is included with macOS
Windows:
Install Visual Studio Build Tools
OpenSSL is handled automatically by vcpkg
GPU Drivers
NVIDIA : NVIDIA drivers installed
: NVIDIA drivers installed AMD : ROCm drivers installed
: ROCm drivers installed Intel : intel-gpu-tools package installed
: intel-gpu-tools package installed Apple Silicon: macOS with Apple Silicon (M1/M2/M3/M4)
Build Requirements
OS : Linux, macOS, or Windows
: Linux, macOS, or Windows Rust: 1.70+ (for building from source)
Quick Start
Install & Run
# Build from source (first build may take 2-3 minutes) git clone https://github.com/kagehq/gpu-kill.git cd gpu-kill cargo build --release # Or install via Cargo cargo install gpukill # List your GPUs gpukill --list # Watch GPU usage in real-time gpukill --list --watch
Common Tasks
# Kill a stuck process gpukill --kill --pid 12345 --force # Reset a crashed GPU gpukill --reset --gpu 0 --force # Start the web dashboard (backend only) gpukill --server --server-port 8080
Dashboard
Start the web interface for cluster monitoring:
# 1. Start the backend API server gpukill --server --server-port 8080 # 2. Start the dashboard UI (in a new terminal) cd dashboard npm install # First time only npm run dev # 3. Access the dashboard open http://localhost:3000
Note: You need both the backend server (port 8080) and frontend UI (port 3000) running for the dashboard to work.
The dashboard provides:
Real-time monitoring of all GPUs
of all GPUs Security detection with threat analysis
with threat analysis Policy management for resource control
for resource control Cluster overview with Magic Moment insights
MCP Server
GPU Kill includes a MCP server that enables AI assistants to interact with GPU management functionality:
Resources : Read GPU status, processes, audit data, policies, and security scans
: Read GPU status, processes, audit data, policies, and security scans Tools: Kill processes, reset GPUs, scan for threats, create policies
# Start the MCP server cargo run --release -p gpukill-mcp # Server runs on http://localhost:3001/mcp
Usage
Ask your AI to use the tools.
What GPUs do I have and what's their current usage?
Kill the Python process that's stuck on GPU 0
Kill all training processes that are using too much GPU memory
Show me GPU usage and kill any stuck processes
Scan for crypto miners and suspicious activity
Create a policy to limit user memory usage to 8GB
Reset GPU 1 because it's not responding
What processes are currently using my GPUs?
See mcp/README.md for detailed MCP server documentation.
Security & Policies
Detect Threats
# Scan for crypto miners and suspicious activity gpukill --audit --rogue # Configure detection rules gpukill --audit --rogue-config
Policy Enforcement
# Enable Guard Mode gpukill --guard --guard-enable # Test policies safely gpukill --guard --guard-test-policies
For detailed security and policy documentation, see DETAILED.md.
Remote Management
Manage GPUs across multiple servers via SSH:
# List GPUs on remote server gpukill --remote staging-server --list # Kill process on remote server gpukill --remote prod-gpu-01 --kill --pid 1234 # Reset GPU on remote server gpukill --remote gpu-cluster --reset --gpu 0
Troubleshooting
Build Issues
OpenSSL not found:
# Ubuntu/Debian sudo apt install build-essential libssl-dev pkg-config # Fedora/RHEL/CentOS sudo dnf install gcc gcc-c++ pkg-config openssl-devel
Other common build issues:
Ensure you have the latest Rust toolchain: rustup update
Clean and rebuild: cargo clean && cargo build --release
Check system dependencies are installed (see Requirements section)
Need Help?
gpukill --help # Show all options gpukill --version # Show version
CI/CD and Testing
GPU Kill uses a CI/CD pipeline with automatic GPU testing:
✅ Conditional GPU testing - Runs automatically when GPU hardware is available
- Runs automatically when GPU hardware is available ✅ Multi-vendor GPU testing on real hardware (NVIDIA, AMD, Intel, Apple Silicon)
on real hardware (NVIDIA, AMD, Intel, Apple Silicon) ✅ Cross-platform compatibility testing
testing ✅ Performance benchmarking and profiling
and profiling ✅ Security auditing and compliance checks
and compliance checks ✅ Stress testing for reliability validation
How GPU Testing Works
On GitHub hosted runners : GPU tests skip gracefully (no GPU hardware)
: GPU tests skip gracefully (no GPU hardware) On self-hosted runners : GPU tests run automatically when GPU hardware is detected
: GPU tests run automatically when GPU hardware is detected On cloud instances : GPU tests run automatically when GPU hardware is available
: GPU tests run automatically when GPU hardware is available On developer machines: GPU tests run automatically when GPU hardware is detected
Quick Setup
Option 1: Test Locally (Already Working)
cargo test --test gpu_hardware_tests # Runs on your GPU hardware
Option 2: Set Up Cloud GPU (5 minutes)
# On any cloud GPU instance: curl -sSL https://raw.githubusercontent.com/kagehq/gpu-kill/main/scripts/setup-gpu-runner.sh | bash
Option 3: Self-Hosted Runner See CI_CD.md for detailed information about our testing infrastructure and how to set up self-hosted runners with GPU hardware.
Option 4: Cloud GPU Setup See docs/CLOUD_GPU_SETUP.md for AWS, GCP, and Azure GPU instance setup.
Documentation
DETAILED.md - Complete documentation, API reference, and advanced features
- Complete documentation, API reference, and advanced features Dashboard README - Web interface documentation
- Web interface documentation CI_CD.md - CI/CD pipeline and testing infrastructure
- CI/CD pipeline and testing infrastructure docs/CLOUD_GPU_SETUP.md - Cloud GPU setup guide (AWS, GCP, Azure)
License
This project is licensed under the FSL-1.1-MIT License. See the LICENSE file for details.