D should follow: "Gentoo Linux Begins Codeberg Migration In Moving Away From GitHub, Avoiding Copilot"

Indraj Gandham newsgroups at indraj.net
Thu Feb 19 21:23:11 UTC 2026


There are two primary questions when it comes to LLMs and copyright:

(1) Can the training of models on copyrighted works constitute infringement?
(2) Can the output of a model constitute infringement?

The answer to both of these questions cannot always be "no", because it 
would enable the development of models specifically with the intent to 
launder copyrighted works.

Even if a court rules that (1) is fair use, considering that it has been 
shown that LLMs can reproduce portions of copyrighted works verbatim, I 
would speculate that the ordinary threshold test will apply in (2).

The problem is that it is not at all obvious whether a given output 
meets the threshold of originality. A simple textual comparison between 
the output and training data is not sufficient to show the absence of 
infringement as non-literal elements can be copied. The test applied by 
courts in such cases is known as Abstraction-Filtration-Comparison (AFC).

To help mitigate this risk, I would suggest the following:

(a) Reject PRs with the "AI Generated" label if the contribution meets 
the threshold of originality; and
(b) Require all contributors to assert that they have the appropriate 
legal rights to make the copyright assignment to DLF.

To determine whether a contribution meets the threshold, you can use the 
guidelines set out by the FSF:

https://www.gnu.org/prep/maintain/maintain.html#Legally-Significant

The purpose of (b) is to shift liability from DLF to the contributor 
should any concerns regarding provenance arise.

Hope to see you all at BeerConf!

Indraj


More information about the Digitalmars-d mailing list