Published on: 2025-06-04 10:03:12
TL;DR We have some very fast AI-generated kernels in pure CUDA-C without using libraries and DSLs such as CUTLASS and Triton. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. Some of our highlighted results: Matmul (FP32): 101.3% performance of FP32 torch.matmul; problem size: 4096x4096 square matrices performance of FP32 torch.matmul; problem size: 4096x4096 square matrices Conv2D: 179.9% performance of FP32 torch
Keywords: kernel memory optimization performance reference
Find related items on AmazonPublished on: 2025-06-09 05:08:01
panida wijitpanya/Getty Images Linus Torvalds officially announced the stable release of the Linux kernel 6.15 on May 25, 2025. Its arrival was delayed for a few hours, Torvalds said, "because of a last-minute bug report resulting in one new feature being disabled at the eleventh hour," but Linux 6.15 is here and ready for you to download and tinker with. Also: Should you ever pay for Linux? 5 times I would - and why The newest feature that caught my eye was that, for the first time, we have
Keywords: hardware kernel linux new nvidia
Find related items on AmazonPublished on: 2025-06-11 19:01:20
There are some applications that benefit from running LLMs really, really fast. This low-latency regime encompasses applications like chatbots and human-in-the-loop workflows, where users care a lot about seeing responses come back immediately. Given the importance of these low-latency workloads, we wanted to explore just how fast we can run open-source models on modern GPUs. To really stress-test existing systems, we consider an aggressive low-latency scenario where we generate a single sequen
Keywords: gpu instruction kernel megakernel memory
Find related items on AmazonPublished on: 2025-06-16 10:33:16
Unfortunately, an embedded system is not free of crashes. To analyze and log such crashes it is useful to have a file system where we can store such information between reboots. One interface which is meant to do that is pstore and its current single implementation ramoops. Ramoops can store log messages inside a reserved memory area in RAM. The nice thing about RAM is, that it should almost always be available when the CPU is still running. Flash memory on the other hand could not be available
Keywords: 220 64 apalis kernel ramoops
Find related items on AmazonPublished on: 2025-06-27 18:46:29
To implement a seamless Linux integration into Starina, I decided to go with a Linux lightweight VM approach similar to WSL2. This means I need to implement a hypervisor that can run Linux. I had implemented an Intel VT-x based hypervisor before, but this time I wanted to try something different: RISC-V H-extension based hypervisor! This post is a diary of my journey of writing a RISC-V hypervisor incrementally. RISC-V H-extension RISC-V H-extension introduces new CPU modes and some more CSR
Keywords: guest hypervisor kernel linux risc
Find related items on AmazonPublished on: 2025-07-10 00:55:56
It's not the first time Torvalds has suggested dropping support for 32-bit processors and relieving kernel developers from implementing archaic emulation and work-around solutions. "We got rid of i386 support back in 2012. Maybe it's time to get rid of i486 support in 2022," Torvalds wrote in October 2022. Failing major changes to the 6.15 kernel, which will likely arrive late this month, i486 support will be dropped. Where does that leave people running a 486 system for whatever reason? They c
Keywords: 486 kernel processors run support
Find related items on AmazonPublished on: 2025-07-31 18:51:41
About The Project 32bit Hobby Operatingsystem with graphics, multitasking and networking! Started: 12.05.2022 (back to top) Login There are 3 default users: system, admin and guest. The password for admin is 'admin', while guest has no password. Currently there is no difference between admin and guest. You can create a user with the 'admin' command: admin create < username > < password > Built With This project is built with C & Assembly for the kernel, utilities and build system. C++ f
Keywords: docker git kernel make qemu
Find related items on AmazonPublished on: 2025-08-03 06:46:45
Some __nonstring__ turbulence [LWN subscriber-only content] Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net! -Wunterminated-string-initialization New compiler releases often bring with them new warnings; those warnings are usually welco
Keywords: 15 gcc kernel string torvalds
Find related items on AmazonPublished on: 2025-08-05 05:00:00
A significant security gap in Linux runtime security caused by the 'io_uring' interface allows rootkits to operate undetected on systems while bypassing advanced Enterprise security software. The flaw was discovered by ARMO security researchers who developed a proof-of-concept rootkit called "Curing" to demonstrate the practicality and feasibility of attacks leveraging io_uring for evasion. io_uring is a Linux kernel interface for efficient, asynchronous I/O operations. It was introduced in 20
Keywords: armo io_uring kernel linux security
Find related items on AmazonPublished on: 2025-08-05 17:19:32
Multi-platform high-performance compute language extension for Rust. With CubeCL, you can program your GPU using Rust, taking advantage of zero-cost abstractions to develop maintainable, flexible, and efficient compute kernels. CubeCL currently fully supports functions, generics, and structs, with partial support for traits, methods and type inference. As the project evolves, we anticipate even broader support for Rust language primitives, all while maintaining optimal performance. Exa
Keywords: cube cubecl kernels runtime vectorization
Find related items on AmazonPublished on: 2025-08-07 18:17:16
In a perfect world, everyone’s systems would be fully updated, patched regularly, and running the latest kernel. But let’s be real—that’s rarely the case. Some environments still rely on legacy versions of Ubuntu or Fedora, while others don't even have their kernels compiled with BTF (BPF Type Format) support. And if you’re maintaining any open-source tools, things get even messier. You have zero control over what kind of system your users will run your program on. All of this makes it trick
Keywords: btf ebpf kernel program struct
Find related items on AmazonPublished on: 2025-08-07 11:43:08
An update on pahole Pahole (originally "Poke-a-hole") is a Swiss Army knife for exploring and editing debug information. Pahole is also currently involved in the kernel's build process to rearrange the information produced by various compilers into a form useful to the BPF verifier, although there are plans to render it unnecessary. Pahole maintainer Arnaldo Carvalho de Melo shared some status updates about the project at the 2025 Linux Storage, Filesystem, Memory-Management, and BPF Summit. In
Keywords: btf information kernel melo pahole
Find related items on AmazonPublished on: 2025-08-12 00:34:07
Much of the world’s web traffic is routed through data centers , which also fuel power-guzzling artificial intelligence (AI) applications. In the U.S. alone, data centers consumed around 4 percent of the country’s total electricity in 2023 , and that number is projected to rise up to 12 percent by 2028. Some researchers are thinking big, formulating innovative schemes to make data centers more sustainable . Others, like Martin Karsten , a professor of systems and networking at the University of
Keywords: data kernel linux network traffic
Find related items on AmazonPublished on: 2025-08-11 23:42:48
Introduction This post details my recent efforts to write an optimized matrix multiplication kernel in CUDA using tensor cores on a NVIDIA Tesla T4 GPU. The goal is to compute $D = \alpha * A * B + \beta * C$, as fast as possible. In this equation $D,A,B$ and $C$ are large matrices full of half precision floating point numbers, and $\alpha$, $\beta$ are constants. This problem is usually referred to as a Half-precision Generalized Matrix Multiply, or HGEMM for short. Tensor Cores are specializ
Keywords: data kernel matrix memory shared
Find related items on AmazonPublished on: 2025-08-12 04:42:48
Introduction This post details my recent efforts to write an optimized matrix multiplication kernel in CUDA using tensor cores on a NVIDIA Tesla T4 GPU. The goal is to compute $D = \alpha * A * B + \beta * C$, as fast as possible. In this equation $D,A,B$ and $C$ are large matrices full of half precision floating point numbers, and $\alpha$, $\beta$ are constants. This problem is usually referred to as a Half-precision Generalized Matrix Multiply, or HGEMM for short. Tensor Cores are specializ
Keywords: data kernel matrix memory shared
Find related items on AmazonPublished on: 2025-08-18 12:19:46
Deployment-ready browsers. Run 'em anywhere 📜 Table of Contents 🤙 Overview Kernel provides sandboxed, ready-to-use Chrome browser environments for agentic workflows that need to access the Internet. containers/docker/Dockerfile and unikernels/unikraft-cu are the core infra that powers our hosted services. ★ Sign-up for the waitlist. ★ *️⃣ Key Features Pre-configured Chrome browser that Chrome DevTools-based browser frameworks (Playwright, Puppeteer) can connect to GUI access for v
Keywords: browser chrome docker unikernel use
Find related items on AmazonPublished on: 2025-08-20 09:13:31
A customer asked for help with a longstanding but low-frequency hang that they have never been able to figure out. From what they could tell, their UI thread was calling into the kernel, and the call simply hung for no apparent reason. Unfortunately, the kernel dump couldn’t show a stack from user mode because the stack had been paged out. (Which makes sense, because a hung thread isn’t using its stack, so once the system is under some memory pressure, that stack gets paged out.) 0: kd> !thread
Keywords: kernel stack suspended thread ui
Find related items on AmazonPublished on: 2025-08-22 09:29:48
Peering into the Linux Kernel with trace June 04, 2020 Recently, I was working on a patch for a popular open-source project, and discovered that the test suite was failing intermittently. A closer look revealed that the last access time for some files in the project folder were changing unexpectedly, and this was causing a test to fail. (The failing test was not related to my patch.) Looking at the project code, it seemed impossible for it to be unexpectedly accessing those files during the te
Keywords: function kernel linux touch_atime trace
Find related items on AmazonPublished on: 2025-08-24 00:45:22
PDEATHSIG is almost never what you want It was a fine Sunday evening. I had just landed in LA for our company offsite, and I met the whole team in person for the first time. A little later, my phone buzzed with a notification, “Antonio assigned you an issue on Linear.” Antonio is an engineering team lead at Recall.ai, and he tasked me with optimizing Output Media start latency so that our customers’ AI agents would launch faster. I thought it was going to be the quickest, most straightforward
Keywords: bubblewrap kernel parent process thread
Find related items on AmazonPublished on: 2025-08-26 12:29:46
My Own Private Binary An Idiosyncratic Introduction to Linux Kernel Modules How This Began Several years ago, I spent a serious chunk of time figuring out how to make really teensy ELF executable files. I started down this path because I was annoyed that all of my programs, no matter how short they were, never got smaller than 4k or so. I felt that was excessive, for C, and so I started looking at what ELF files contained, and how much of that actually needed to be there. (And then, after a w
Keywords: com file format kernel linux
Find related items on AmazonPublished on: 2025-08-27 16:23:17
Precise GPU observability and programmability are essential for optimizing performance in AI workloads and other computationally intensive high-performance computing (HPC) applications. In this paper, we introduce eGPU, the first framework and eBPF runtime that dynamically offloads eBPF bytecode onto GPUs via dynamic PTX injection. Designed primarily for observability, our system leverages real-time GPU telemetry, eBPF-based dynamic instrumentation, and automated performance analysis to pinpoint
Keywords: ebpf gpu instrumentation kernel memory
Find related items on AmazonPublished on: 2025-08-27 11:16:54
Linux Kernel Defence Map Linux kernel security is a very complex topic. There are many concepts that have interesting relationships with each other: Vulnerability classes Exploitation techniques Bug detection mechanisms Defence technologies Some defence technologies are provided by the Linux kernel mainline. Others are going out‑of‑tree for various reasons (some of them are commercial, for example). Moreover, there are kernel defences that depend on special hardware features. It would be
Keywords: defence kernel linux map security
Find related items on AmazonPublished on: 2025-09-03 01:57:00
Start of the journey We started our journey with iOS emulation by looking at existing open-source solutions. We had successfully run alephsecurity/xnu-qemu-arm64 before, but the project being read-only was concerning. Then we tried TrungNguyen1909/qemu-t8030 and it had quite a few interesting features: the ability to actually restore iOS (using a second "companion" QEMU for USB connectivity) running iOS 14 a more recent version of QEMU a nice wiki on how to bring up the emulator With that
Keywords: ios kernel patching qemu using
Find related items on AmazonPublished on: 2025-09-03 14:46:19
This post is the result of me going down a several week long XNU rabbit-hole after reading this post by Thomas Claburn on Exclaves, more on that later. I’ve tried my best to condense all the information into a single blog post. I’ve also tried to keep sections self-contained so you can skip around using the table of contents, this does come at the cost of repeating myself in some places, so thanks in advance for your patience. While I’m confident of my understanding on this topic, some errors ar
Keywords: apple kernel mach memory xnu
Find related items on AmazonPublished on: 2025-09-03 19:46:19
This post is the result of me going down a several week long XNU rabbit-hole after reading this post by Thomas Claburn on Exclaves, more on that later. I’ve tried my best to condense all the information into a single blog post. I’ve also tried to keep sections self-contained so you can skip around using the table of contents, this does come at the cost of repeating myself in some places, so thanks in advance for your patience. While I’m confident of my understanding on this topic, some errors ar
Keywords: apple kernel mach memory xnu
Find related items on AmazonPublished on: 2025-09-06 21:46:03
Roland W. Kunz/Getty Images LONDON -- When I met up with my open-source buddy Dustin Kirkland, VP of engineering at Chainguard, at KubeCon Europe, he said he had me to thank for his company's new Linux distribution, Chainguard OS. Why? In my May 2024 story about kernel security, I'd said all distros had been doing Linux security wrong. (That was the conclusion of a CIQ study, Linux stable kernel maintainer Greg Kroah-Hartman, and top Linux developer Kees Cook.) "A light bulb went off," Kirkla
Keywords: chainguard kernel linux os secure
Find related items on AmazonPublished on: 2025-09-08 20:59:25
About social.kernel.org Terms of service Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee. "Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1. You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of
Keywords: account kernel linux platform service
Find related items on AmazonPublished on: 2025-09-12 10:55:21
Introduction Hi everyone ! In this post, I will share with you all the steps to write an optimized FP32 matrix multiplication on AMD RDNA3 GPU outperforming rocBLAS by 60%. I will cover some basics and explain all the optimizations I have implemented. This will be done in a iterative way in 8 differents Kernels. Figure 1: sneak peek of the performance results I primary intended to work on this to deepen my understanding of RDNA3 and try out HIP and I felt like I needed to share what I learne
Keywords: figure kernel lds performance v_dual_fmac_f32
Find related items on AmazonPublished on: 2025-09-19 13:28:28
The Linux Foundation The Linux Foundation offers many Linux and open-source classes. Many of these courses provide helpful certifications for Linux/open-source job hunters. What you may not know, though, is that Shuah Khan, the renowned Linux kernel developer and Linux Foundation Fellow, directs the elite-level Linux Kernel Mentorship Program (LKMP). Also: Linux kernel 6.14 is a big leap forward in performance and Windows compatibility Khan's involvement with the Linux kernel dates back to 20
Keywords: kernel linux lkmp open program
Find related items on AmazonPublished on: 2025-09-21 04:54:28
Mike Hill/Getty Images It's nice to know I'm not the only one who can blow a deadline. Linux Torvalds confessed that he'd love to have had "some good excuse for why I didn't do the 6.14 release yesterday on my regular Sunday afternoon release schedule. … But no. It's just pure incompetence. Because absolutely nothing last-minute happened yesterday, and I was just clearing up some unrelated things in order to be ready for the merge window. And in the process just entirely forgot to actually ever
Keywords: 14 kernel linux release rust
Find related items on AmazonGo K’awiil is a project by nerdhub.co that curates technology news from a variety of trusted sources. We built this site because, although news aggregation is incredibly useful, many platforms are cluttered with intrusive ads and heavy JavaScript that can make mobile browsing a hassle. By hand-selecting our favorite tech news outlets, we’ve created a cleaner, more mobile-friendly experience.
Your privacy is important to us. Go K’awiil does not use analytics tools such as Facebook Pixel or Google Analytics. The only tracking occurs through affiliate links to amazon.com, which are tagged with our Amazon affiliate code, helping us earn a small commission.
We are not currently offering ad space. However, if you’re interested in advertising with us, please get in touch at [email protected] and we’ll be happy to review your submission.