This has real consequences. New hardware takes months to acquire, energy costs are climbing alongside AI's surging electricity demand, and every dollar spent on unnecessary GPUs is a dollar not spent on the models themselves. Every percentage point of real throughput recovered from existing hardware is money not spent, a server rack not built, and a kilowatt-hour not consumed. Accurate measurement is the foundation, and Systalyze is the optimization platform built on top of it, enabling you to close the gap between where your deployment is and where it could be.
Systalyze is open-sourcing Utilyze, a free, production-ready monitoring and debugging tool that accurately shows how efficiently your GPUs are actually doing useful work, and how close you are to the realistic maximum for your specific workload. Utilyze runs alongside any AI workload in real time with negligible overhead. In production deployments, Utilyze revealed orders-of-magnitude performance headroom in settings that standard tools declared fully saturated.
The standard GPU utilization metric, the one reported by nvidia-smi , nvtop , rocm-smi , Weights & Biases , Amazon CloudWatch , Google Cloud Monitoring , and Azure Monitor , does not measure how hard your GPU is actually working. It only tells you whether the GPU is doing anything at all. Real compute throughput can be as low as 1% while dashboards read 100%. That single misleading number drives enormous amounts of wasted spend, wasted energy, and unnecessary hardware purchases across the AI industry.
The AI Industry Has a Measurement Problem The world is on the brink of an AI compute crisis. The electricity demand of AI clusters is rising quickly , lead times for acquiring GPUs stretch into months, and NVIDIA H100 one-year rental contract pricing rose almost 40% from October 2025 to March 2026 . Getting more hardware is slow, expensive, and for many organizations, simply not an option. Such scarcity in AI compute puts “optimization” at the center of what every enterprise is trying to achieve. That is the core of what Systalyze does: diagnosing and optimizing AI systems to improve the end-to-end performance of AI workloads. But working across many production AI deployments, we kept encountering the same surprising reality: most teams had no idea how inefficiently their GPUs were actually running. They assumed high utilization because their dashboards said so. Before anyone can close a performance gap, they have to be able to see it. Accurate GPU utilization measurement is not just useful, it is the prerequisite for any meaningful optimization. And it turns out the measurement tool that most organizations depend on for that is wrong. In particular, the GPU utilization metric, reported by nvidia-smi , nvtop , rocm-smi , Weights & Biases ( gpu.{i}.gpu ), Amazon CloudWatch , Google Cloud Monitoring , and Azure Monitor , does not measure how hard your GPU is working. It measures whether your GPU is doing anything at all. If at least one kernel is executing throughout the sampling window, the metric can read 100%, regardless of whether the GPU is using a fraction of a percent of its actual compute capacity or saturating it. This lack of insight into utilization drives bad decisions, like purchasing more GPUs under the belief that existing ones are at capacity, and makes it harder to identify where workloads can be optimized. We’re introducing Utilyze ( utlz ), a free, open-source tool that fills this gap. Unlike existing monitoring tools, Utilyze measures how efficiently your GPU is actually doing useful work, not just whether it’s running, and shows you this live, without slowing down your workload. It also tells you the realistic ceiling for your specific hardware and model combination, so you know whether you’re close to maximum performance or leaving capacity on the table.
The AI Industry Has a Measurement Problem The world is on the brink of an AI compute crisis. The electricity demand of AI clusters is rising quickly , lead times for acquiring GPUs stretch into months, and NVIDIA H100 one-year rental contract pricing rose almost 40% from October 2025 to March 2026 . Getting more hardware is slow, expensive, and for many organizations, simply not an option. Such scarcity in AI compute puts “optimization” at the center of what every enterprise is trying to achieve. That is the core of what Systalyze does: diagnosing and optimizing AI systems to improve the end-to-end performance of AI workloads. But working across many production AI deployments, we kept encountering the same surprising reality: most teams had no idea how inefficiently their GPUs were actually running. They assumed high utilization because their dashboards said so. Before anyone can close a performance gap, they have to be able to see it. Accurate GPU utilization measurement is not just useful, it is the prerequisite for any meaningful optimization. And it turns out the measurement tool that most organizations depend on for that is wrong. In particular, the GPU utilization metric, reported by nvidia-smi , nvtop , rocm-smi , Weights & Biases ( gpu.{i}.gpu ), Amazon CloudWatch , Google Cloud Monitoring , and Azure Monitor , does not measure how hard your GPU is working. It measures whether your GPU is doing anything at all. If at least one kernel is executing throughout the sampling window, the metric can read 100%, regardless of whether the GPU is using a fraction of a percent of its actual compute capacity or saturating it. This lack of insight into utilization drives bad decisions, like purchasing more GPUs under the belief that existing ones are at capacity, and makes it harder to identify where workloads can be optimized. We’re introducing Utilyze ( utlz ), a free, open-source tool that fills this gap. Unlike existing monitoring tools, Utilyze measures how efficiently your GPU is actually doing useful work, not just whether it’s running, and shows you this live, without slowing down your workload. It also tells you the realistic ceiling for your specific hardware and model combination, so you know whether you’re close to maximum performance or leaving capacity on the table.
"The gap isn't awareness — engineers who write CUDA kernels know what accurate utilization looks like. The gap is tooling. There has never been a way to see true GPU efficiency continuously, in production, without slowing down the workload."
— Manya Ghobadi, MIT Professor & CEO, Systalyze
What nvidia-smi Is Actually Telling You To understand why the standard metric falls short, it helps to look at how the number is actually computed. The mechanics are simple, which is part of the problem. The GPU samples a binary signal: is at least one kernel scheduled on the GPU right now? and averages it over a sampling window (typically 1 second). The reported percentage is the fraction of that window where the answer was “yes.” If one single kernel ran for 400ms of a 1-second window, nvidia-smi reports 40% total GPU utilization. If a kernel ran the entire window, even a single tiny kernel on one of the Streaming Multiprocessors (SMs), it reports 100%. For instance, an H100 GPU has 132 SMs, each containing 128 CUDA cores and 4 Tensor Cores: 17,424 cores total. Essentially, nvidia-smi treats one busy CUDA core and 1000s of busy CUDA cores identically. The metric was designed for an earlier era when GPUs were running graphics pipelines and knowing whether the GPU was idle or active was useful. A graphics workload either has frames to render or it doesn’t. That binary distinction made sense then. It has not been updated for AI workloads, where a model can run continuously on the GPU while using a small fraction of its actual compute capacity. This limitation propagates through the entire monitoring stack. Weights & Biases automatically logs GPU utilization as gpu.{i}.gpu in its system metrics, sourced directly from nvidia-smi . The same is true of the major cloud monitoring dashboards: Amazon CloudWatch’s GPU monitoring , for example, uses nvidia-smi as its data source, the metric is literally named nvidia_smi_utilization_gpu , and the equivalent GPU utilization surfaces in GCP Monitoring and Azure Monitor are built on the same underlying driver counter. The situation on AMD GPUs is the same: rocm-smi reports the identical “any kernel scheduled” metric as nvidia-smi ,
What nvidia-smi Is Actually Telling You To understand why the standard metric falls short, it helps to look at how the number is actually computed. The mechanics are simple, which is part of the problem. The GPU samples a binary signal: is at least one kernel scheduled on the GPU right now? and averages it over a sampling window (typically 1 second). The reported percentage is the fraction of that window where the answer was “yes.” If one single kernel ran for 400ms of a 1-second window, nvidia-smi reports 40% total GPU utilization. If a kernel ran the entire window, even a single tiny kernel on one of the Streaming Multiprocessors (SMs), it reports 100%. For instance, an H100 GPU has 132 SMs, each containing 128 CUDA cores and 4 Tensor Cores: 17,424 cores total. Essentially, nvidia-smi treats one busy CUDA core and 1000s of busy CUDA cores identically. The metric was designed for an earlier era when GPUs were running graphics pipelines and knowing whether the GPU was idle or active was useful. A graphics workload either has frames to render or it doesn’t. That binary distinction made sense then. It has not been updated for AI workloads, where a model can run continuously on the GPU while using a small fraction of its actual compute capacity. This limitation propagates through the entire monitoring stack. Weights & Biases automatically logs GPU utilization as gpu.{i}.gpu in its system metrics, sourced directly from nvidia-smi . The same is true of the major cloud monitoring dashboards: Amazon CloudWatch’s GPU monitoring , for example, uses nvidia-smi as its data source, the metric is literally named nvidia_smi_utilization_gpu , and the equivalent GPU utilization surfaces in GCP Monitoring and Azure Monitor are built on the same underlying driver counter. The situation on AMD GPUs is the same: rocm-smi reports the identical “any kernel scheduled” metric as nvidia-smi ,
... continue reading