Debian decides not to decide on AI-generated contributions

Debian decides not to decide on AI-generated contributions [LWN subscriber-only content]

Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net! Free trial subscription Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

Debian is the latest in an ever-growing list of projects to wrestle (again) with the question of LLM-generated contributions; the latest debate stared in mid-February, after Lucas Nussbaum opened a discussion with a draft general resolution (GR) on whether Debian should accept AI-assisted contributions. It seems to have, mostly, subsided without a GR being put forward or any decisions being made, but the conversation was illuminating nonetheless.

Nussbaum said that Debian probably needed to have a discussion " to understand where we stand regarding AI-assisted contributions to Debian " based on some recent discussions, though it was not clear what discussions he was referring to. Whatever the spark was, Nussbaum put forward the draft GR to clarify Debian's stance on allowing AI-assisted contributions. He said that he would wait a couple of days to collect feedback before formally submitting the GR.

His proposal would allow " AI-assisted contributions (partially or fully generated by an LLM) " if a number of conditions were met. For example, it would require explicit disclosure if " a significant portion of the contribution is taken from a tool without manual modification ", and labeling of such contributions with " a clear disclaimer or a machine-readable tag like '[AI-Generated]' ." It also spells out that contributors should " fully understand " their submissions and would be accountable for the contributions, " including vouching for the technical merit, security, license compliance, and utility of their submissions ". The GR would also prohibit using generative-AI tools with non-public or sensitive project information, including private mailing lists or embargoed security reports.

AI is a marketing term

It is fair to say that it is difficult to have an effective conversation about a technology when pinning down accurate terminology is like trying to nail Jell-O to a tree. AI is the catch-all term, but much (not all) of the technology in question is actually tooling around large language models (LLMs). When participants have differing ideas of what is being discussed, deciding whether the thing should be allowed may pose something of a problem.

Russ Allbery asked for people to be more precise in their descriptions of the technologies that their proposals might affect. He asserted that it has become common for AI, as a term, " to be so amorphously and sloppily defined that it could encompass every physical object in the universe ". If the project is going to make policy, he said, it needed to be very specific about what it was making policy about:

An LLM has some level of defined meaning, although even there it would be nice if people were specific. Reinforcement learning is a specific technique with some interesting implications, such as the existence of labeled test data used to train the algorithm. "AI" just means whatever the person writing a given message wants it to mean and often changes meaning from one message to the next, which makes it not useful for writing any sort of durable policy.

Gunnar Wolf agreed with Allbery, but Nussbaum claimed that the specific technology did not matter. The proposal boiled down to the use of automated tools for code analysis and generation:

... continue reading