The FSF considers large language models
Ready to give LWN a try? With a subscription to LWN, you can stay current with what is happening in the Linux and free-software community and take advantage of subscriber-only site features. We are pleased to offer you a free trial subscription, no credit card required, so that you can see for yourself. Please, join us!
The Free Software Foundation's Licensing and Compliance Lab concerns itself with many aspects of software licensing, Krzysztof Siewicz said at the beginning of his 2025 GNU Tools Cauldron session. These include supporting projects that are facing licensing challenges, collecting copyright assignments, and addressing GPL violations. In this session, though, there was really only one topic that the audience wanted to know about: the interaction between free-software licensing and large language models (LLMs).
Anybody hoping to exit the session with clear answers about the status of LLM-created code was bound to be disappointed; the FSF, too, is trying to figure out what this landscape looks like. The organization is currently running a survey of free-software projects with the intent of gathering information about what position those projects are taking with regard to LLM-authored code. From that information (and more), the FSF eventually hopes to come up with guidance of its own.
Nick Clifton asked whether the FSF is working on a new version of the GNU General Public License — a GPLv4 — that takes LLM-generated code into account. No license changes are under consideration now, Siewicz answered; instead, the FSF is considering adjustments to the Free Software Definition first.
Siewicz continued that LLM-generated code is problematic from a free-software point of view because, among other reasons, the models themselves are usually non-free, as is the software used to train them. Clifton asked why the training code mattered; Siewicz said that at this point he was just highlighting the concern that some feel. There are people who want to avoid proprietary software even when it is being run by others.
Siewicz went on to say that one of the key questions is whether code that is created by an LLM is copyrightable and, if not, if there is some way to make it copyrightable. It was never said explicitly, but the driving issue seems to be whether this software can be credibly put under a copyleft license. Equally important is whether such code infringes on the rights of others. With regard to copyrightability, the question is still open; there are some cases working their way through the courts now. Regardless, though, he said that it seems possible to ensure that LLM output can be copyrighted by applying some human effort to enhance the resulting code. The use of a " creative prompt " might also make the code copyrightable.
Many years ago, he said, photographs were not generally seen as being copyrightable. That changed over time as people figured out what could be done with that technology and the creativity it enabled. Photography may be a good analogy for LLMs, he suggested.
There is also, of course, the question of copyright infringements in code produced by LLMs, usually in the form of training data leaking into the model's output. Prompting an LLM for output " in the style of " some producer may be more likely to cause that to happen. Clifton suggested that LLM-generated code should be submitted with the prompt used to create it so that the potential for copyright infringement can be evaluated by others.
Siewicz said that he does not know of any model that says explicitly whether it incorporates licensed data. As some have suggested, it could be possible to train a model exclusively on permissively licensed material so that its output would have to be distributable, but even permissive licenses require the preservation of copyright notices, which LLMs do not do. A related concern is that some LLMs come with terms of service that assert copyright over the model's output; incorporating such code into a free-software project could expose that project to copyright claims.
... continue reading