Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: uint Clear Filter

Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels

Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. We found that they can: our AI-generated Metal kernels were 1.87x faster across 215 PyTorch modules, with some workloads running hundreds of times faster than baseline. Why use AI to generate kernels for Apple devices? AI models execute on hardware via GPU kernels that define each oper

Speeding up PyTorch inference by 87% on Apple with AI-generated Metal kernels

Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. We found that they can: our AI-generated Metal kernels were 1.87x faster across 215 PyTorch modules, with some workloads running hundreds of times faster than baseline. Why use AI to generate kernels for Apple devices? AI models execute on hardware via GPU kernels that define each oper

The 10 Best Moments in ‘Jaws’

One of the greatest films ever made, Jaws, celebrates its 50th anniversary this year, and to commemorate the occasion, it returns to theaters this weekend. And not just regular theaters. Jaws is being re-released in 3D, IMAX, and even 4DX. Yes, you can ride along in your theater seat and feel the watery mist alongside Brody, Hooper, and Quint as the Orca sets sail. You can find showtimes and buy tickets for all of those at this link. But, to get even more excited about seeing the Steven Spielbe

Efficiently Generating a Number in a Range (2018)

The vast majority of my posts about random number generation have focused on looking at the properties of different generation schemes. But, perhaps surprisingly, the performance of your randomized algorithm may hinge not on the generation scheme you chose, but on other factors. In this post (inspired by and building on an excellent recent paper by Daniel Lemire), we'll explore a common source of overhead in random number generation that frequently outweighs PRNG engine performance. Imagine thi

Writing a Game Boy Emulator in OCaml

Introduction For the past few months, I have been working on a project called CAMLBOY, a Game Boy emulator written in OCaml that runs in the browser. You can try it out on the following demo page: Demo Page I included several homebrew ROMs in the demo, so please try them out (I recommend Bouncing ball and Rocket Man Demo). You can also play with it in your mobile browser as it runs at 60 FPS on recent smartphones. Repository You can find the repository here: https://github.com/linoscope/CA

Topics: arg bit type uint16 uint8