Windows ML is generally available

The future of AI is hybrid, utilizing the respective strengths of cloud and client while harnessing every Windows device to achieve more. At Microsoft, we are reimagining what’s possible by bringing powerful AI compute directly to Windows devices, unlocking a new era of intelligence that runs where you are. With groundbreaking advancements in silicon, a modernized software stack and deep OS integration, Windows 11 is transforming into the world’s most open and capable platform for local AI.

Today we are excited to share that Windows ML is now generally available for production use to assist developers with deploying production experiences in the evolving AI landscape. First introduced at Build 2025, Windows ML is the built-in AI inferencing runtime optimized for on-device model inference and streamlined model dependency management across CPUs, GPUs and NPUs, serving as the foundation for Windows AI Foundry and utilized by Foundry Local to enable expanded silicon support which is being released today.

By harnessing the power of CPUs, GPUs and NPUs from our vibrant silicon partner ecosystem and building on ONNX’s strong momentum, Windows ML empowers developers to deliver real-time, secure and efficient AI workloads — right on the device. This ability to run models locally enables developers to build AI experiences that are more responsive, private and cost-effective, reaching users across the broadest range of Windows hardware.

Bring your own model and deploy efficiently across silicon – securely and locally on Windows

Windows ML is compatible with ONNX Runtime (ORT), allowing developers to utilize familiar ORT APIs and enabling easy transition for existing production workloads. Windows handles distribution and maintenance of ORT and the Execution Providers, taking that responsibility on from the App Developer. Execution Providers (EPs) are the bridge between the core runtime and the powerful and diverse silicon ecosystem, enabling independent optimization of model execution on the different chips from AMD, Intel, NVIDIA and Qualcomm. With ONNX as its model format, Windows ML integrates smoothly with current models and workflows. Developers can easily use their existing ONNX models or convert and optimize their source PyTorch models through the AI Toolkit for VS Code and deploy across Windows 11 PCs.

While AI developers work with various models, Windows ML acts as a hardware abstraction layer offering several benefits:

Simplified Deployment: Our infrastructure APIs allow developers to support various hardware architectures without multiple app builds by leveraging execution providers available on the device or by dynamically downloading them. Developers also have the flexibility to precompile their models ahead-of-time (AOT) for a streamlined end-user experience.

Reduce App Overhead: Windows ML automatically detects the user’s hardware and downloads the appropriate execution providers, eliminating the need to bundle the runtime or EPs in a developer’s application. This streamlined approach saves developers tens to hundreds of megabytes in app size when targeting a broad range of devices.

Compatibility: Through collaboration with our silicon partners, Windows ML aims to maintain conformance and compatibility, supporting ongoing updates while ensuring model accuracy across different builds through a certification process.

Advanced Silicon Targeting: Developers can assign device policies to optimize for low power (NPU), high performance (GPU) or specify the silicon used for a model.

... continue reading