Skip to content
Tech News
← Back to articles

CUDA-oxide: Nvidia's official Rust to CUDA compiler

read original more articles
Why This Matters

cuda-oxide is an early-stage Rust-to-CUDA compiler that enables developers to write GPU kernels in safe, idiomatic Rust, simplifying the development of high-performance GPU applications. Its ability to compile directly to PTX without relying on DSLs or foreign languages could streamline GPU programming workflows and make GPU acceleration more accessible to Rust developers. Despite being in alpha, it represents a significant step toward more integrated and safer GPU programming in the Rust ecosystem, potentially influencing future tools and industry standards.

Key Takeaways

The cuda-oxide Book#

cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign language bindings, just Rust.

Note This book assumes familiarity with the Rust programming language, including ownership, traits, and generics. Later chapters on async GPU programming also assume working knowledge of async / .await and runtimes like tokio. For a refresher, see The Rust Programming Language, Rust by Example, or the Async Book.

Project Status# The v0.1.0 release is an early-stage alpha: expect bugs, incomplete features, and API breakage as we work to improve it. We hope you’ll try it and help shape its direction by sharing feedback on your experience.

🚀 Quick start# use cuda_device ::{ cuda_module , kernel , thread , DisjointSlice }; use cuda_core ::{ CudaContext , DeviceBuffer , LaunchConfig }; #[cuda_module] mod kernels { use super :: * ; #[kernel] fn vecadd ( a : & [ f32 ], b : & [ f32 ], mut c : DisjointSlice < f32 > ) { let idx = thread :: index_1d (); let i = idx . get (); if let Some ( c_elem ) = c . get_mut ( idx ) { * c_elem = a [ i ] + b [ i ]; } } } fn main () { let ctx = CudaContext :: new ( 0 ). unwrap (); let stream = ctx . default_stream (); let module = kernels :: load ( & ctx ). unwrap (); let a = DeviceBuffer :: from_host ( & stream , & [ 1.0 f32 ; 1024 ]). unwrap (); let b = DeviceBuffer :: from_host ( & stream , & [ 2.0 f32 ; 1024 ]). unwrap (); let mut c = DeviceBuffer :: < f32 > :: zeroed ( & stream , 1024 ). unwrap (); module . vecadd ( & stream , LaunchConfig :: for_num_elems ( 1024 ), & a , & b , & mut c ) . unwrap (); let result = c . to_host_vec ( & stream ). unwrap (); assert_eq! ( result [ 0 ], 3.0 ); } Build and run with cargo oxide run vecadd upon installing the prerequisites. Note #[cuda_module] embeds the generated device artifact into the host binary and generates a typed kernels::load function plus one launch method per kernel. The lower-level load_kernel_module and cuda_launch! APIs remain available when you need to load a specific sidecar artifact or build custom launch code.