Skip to content
Tech News
← Back to articles

Vibe-Coded Ext4 for OpenBSD

read original more articles
Why This Matters

This development highlights the growing influence of AI-generated code in the open-source community and raises important questions about licensing, originality, and security. For consumers and the tech industry, it underscores the need for careful scrutiny of AI-produced software to ensure compliance and reliability.

Key Takeaways

Vibe-coded ext4 for OpenBSD [LWN subscriber-only content]

Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

A number of projects have been struggling with the question of which submissions created by large language models (LLMs), if any, should be accepted into their code base. This discussion has been further muddied by efforts to use LLM-driven reimplemention as a way to remove copyleft restrictions from a body of existing code, as recently happened with the Python chardet module . In this context, an attempt to introduce an LLM-generated implementation of the Linux ext4 filesystem into OpenBSD was always going to create some fireworks, but that project has its own, clearly defined reasons for looking askance at such submissions.

It all started on March 17, when Thomas de Grivel posted an ext4 implementation to the openbsd-tech mailing list. This implementation, he said, provides full read and write access and passes the e2fsck filesystem checker; it does not support journaling, however. The code includes a number of copyright assertions, but says nothing about how it was written. In a blog post, though, de Grivel was more forthcoming about the code's provenance:

No Linux source files were ever read to build this driver. It's pure AI (ChatGPT and Claude-code) and careful code reviews and error checking and building kernel and rebooting/testing from my part.

There were a number of predictable concerns raised about this code, many having to do with the possibility that it could be considered to be a derived product of the (GPL-licensed) Linux implementation. The fact that the LLM in question was almost certainly trained on the Linux ext4 code and documentation does not help. Bringing GPL-licensed code into OpenBSD is, to put it lightly, not appreciated; Christian Schulte was concerned about license contamination:

I searched for documentation about that ext4 filesystem in question. I found some GPL licensed wiki pages. The majority of available documentation either directly or indirectly points at GPL licensed code. In my understanding of the issue discussed in this thread this already introduces licensing issues. Even if you would write an ext4 filesystem driver from scratch for base, you would almost always need to incorporate knowledge carrying an illiberal license.

Theo de Raadt, however, pointed out that reimplementation of structures and algorithms is allowed by copyright law; that is how interoperability happens. One should not conclude that De Raadt was in favor of merging this contribution, though.

From the OpenBSD point of view, the copyright status of LLM-generated code is indeed problematic, for the simple reason that nobody knows what that status is, or even if a copyright can exist on that code at all. Without copyright, it is not possible to grant the project the rights it needs to redistribute the code. As De Raadt explained:

At present, the software community and the legal community are unwilling to accept that the product of a (commercial, hah) AI system produces is Copyrightable by the person who merely directed the AI. And the AI, or AI companies, are not recognized as being able to do this under Copyright treaties or laws, either. Even before we get to the point that the AI's are corpus-blenders and Copyright-blenders. So as of today, the Copyright system does not have a way for the output of a non-human produced set of files to contain the grant of permissions which the OpenBSD project needs to perform combination and redistribution.

... continue reading