Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: cuda Clear Filter

Apple’s machine learning framework is getting support for NVIDIA’s CUDA platform

Apple’s MLX machine learning framework, originally designed for Apple Silicon, is getting a CUDA backend, which is a pretty big deal. Here’s why. The work is being led by developer @zcbenz on GitHub (via AppleInsider), who started prototyping CUDA support a few months ago. Since then, he split the project into smaller pieces, and gradually merged them into Apple’s MLX’s main branch. The backend is still a work in progress, but several core operations, like matrix multiplication, softmax, reduc

Asynchronous Error Handling Is Hard

(Ed. note: This article was originally published on The CUDA Handbook blog on November 2, 2023.) Every API designer has struggled with the question of how best to propagate errors to their callers, since before the term “API” was invented. Even decades ago (say 30+ years), interface designers knew to separate the error return from the payload, in functions that return other results to their caller. Since it is sometimes useful to know what not to do: My favorite example of an antipattern in th

CUDA Ray Tracing 2x Faster Than RTX: My CUDA Ray Tracing Journey

Welcome! This article is a deep dive into how I made a CUDA-based ray tracer that outperforms a Vulkan/RTX implementation—sometimes by more than 3x—on the same hardware. If you're interested in GPU programming, performance optimization, or just want to see how far you can push a path tracer, you're in the right place. The comparison is with RayTracingInVulkan by GPSnoopy, a well-known Vulkan/RTX renderer. My goal wasn't just to port Ray Tracing in One Weekend to CUDA, but to squeeze every last

Topics: cuda float memory ray std

Show HN: I built a tensor library from scratch in C++/CUDA

DSC About DSC is a PyTorch-compatible tensor library and inference framework for machine learning models. It features a C-compatible low-level API that is wrapped in a modern Python API very similar to NumPy / PyTorch but with some nice usability improvements. Some key features of DSC include: Intuitive API : DSC Python API closely resembles NumPy / PyTorch. Built-in neural networks support : DSC comes with nn.Module built-in. Porting a model from PyTorch to DSC is trivial (check out the ex

Topics: cuda dsc level make tests