As legal battles over AI training continue to unfold, two rulings from the Northern District of California—Bartz v. Anthropic and Kadrey v. Meta—offer differing takes on one of the thorniest issues in copyright law today: whether making intermediate copies of copyrighted works to train large language models (LLM) constitutes fair use. While both cases involve similar facts—tech companies acquiring and copying books without authorization and using them to train generative AI systems—their legal analysis diverges, especially on the significance of intermediate copying.
In Bartz v. Anthropic, Judge Alsup took a granular approach, dissecting each stage of Anthropic’s pipeline—from downloading what are described as pirated ebooks to scanning purchased books, and finally to tokenizing those texts for ingestion into Antrhopic’s Claude LLM. Judge Alsup explicitly recognized that every step in the pipeline involved copying, but framed these acts as part of a transformative process. He likened the training of LLMs to teaching students to write: using books to instill pattern recognition, not to reproduce them. Although the Court denied summary judgment regarding fair use of certain retained pirated works not used in training, the court declined to treat that as fatal, noting that retaining materials in a “generalized data area” for future research use wasn’t inherently disqualifying. Overall, for the Court, the crucial point was that the copies were not substitutes for the original works in the market, and the use and LLM output was fundamentally different in purpose.
By contrast, in Kadrey v. Meta, Judge Chhabria focused less on the mechanical process of acquiring works and having them digested by the LLM and more on the evidentiary gaps in the plaintiffs’ case. He acknowledged that Meta had downloaded the plaintiffs’ books from shadow libraries and incorporated them into training datasets, but emphasized that fair use hinges on actual or likely market substitution. Since the plaintiffs failed to show that Meta’s models could reproduce their books in any meaningful or competitive way—and failed to present a viable licensing market that Meta had harmed—the court granted summary judgment for Meta. While the Court accepted that intermediate copies are technically acts of reproduction, he concluded that in the absence of meaningful output reproduction or demonstrable economic harm, such copying did not violate copyright law.
The divide between the two decisions is stark. Bartz treats the intermediate copying steps as integral to a transformative process, with the Court willing to treat tokenization and model training as sufficiently divorced from expressive use to qualify for fair use. Kadrey, by contrast, views the fair use defense as turning primarily on market harm and output substitution, giving minimal weight to the mechanics of the copying unless those acts have measurable downstream effects.
Critically, and consistent with established fair use precedent, both Courts declined to adopt a bright-line rule that intermediate copying is either categorically infringing or categorically fair. Instead, each focused on context: Bartz on how transformative the internal uses were, Kadrey on whether plaintiffs could prove economic injury from the resulting public-facing product. Together, these cases underscore the fact-intensive, purpose-driven nature of fair use in the AI era—and the strategic necessity for plaintiffs to build detailed evidentiary records that trace the full lifecycle of their works through the training pipeline.
The decisions are Bartz et al. v. Anthropic PBC, Case No. C 24-05417 WHA (N.D. Cal. June 23, 2025) and Kadrey et al. v. Meta Platforms, Inc., Case No. 23-cv-03417 VC (N.D. Cal. June 25, 2025).