Tech News
← Back to articles

AutoKernel: Autoresearch for GPU Kernels

read original related products more articles

AutoKernel

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Inspired by @karpathy/autoresearch -- which demonstrated autonomous AI agents for LLM training research. AutoKernel applies the same philosophy to GPU kernel optimization: agent modifies one file, runs a fixed evaluation, keeps or reverts, repeats forever.

How It Works

Give AutoKernel any PyTorch model. It will:

Profile the model to find which GPU kernels are bottlenecks Extract each bottleneck as a standalone Triton kernel Optimize each kernel autonomously (edit, benchmark, keep/revert -- forever) Verify end-to-end correctness and report the total speedup

The agent reads program.md -- the "research org code" -- which contains comprehensive instructions for autonomous operation. It edits kernel.py one kernel at a time, runs bench.py (fixed benchmark with 5-stage correctness checks + roofline analysis), and either keeps or reverts the change. The orchestrator decides when to move to the next kernel using Amdahl's law.

Each experiment takes ~90 seconds. That's ~40 experiments/hour, ~320 overnight, across all kernels.

Quick Start

Requirements: NVIDIA GPU (tested on H100/A100/RTX 4090), Python 3.10+, uv.

... continue reading