Salesforce Sued for ‘Stolen Books’ in AI Copyright Lawsuit
In what could become a defining moment for AI copyright law, bestselling authors Molly Tanzer and Jennifer Gilmore have filed a class action lawsuit against Salesforce Inc., accusing the tech giant of secretly using thousands of copyrighted books to train its xGen AI models without consent or payment.
Filed in the Northern District of California in October 2025, the case Tanzer et al. v. Salesforce asks one provocative question now echoing across creative and legal circles:
When AI learns from your words, does that count as inspiration, or theft?
The complaint goes far beyond one company’s conduct. It challenges the very foundations of how modern generative AI systems are built, monetized, and justified under the legal shield of “fair use.”
And for Salesforce, a brand long associated with ethical innovation, the optics could be devastating.
How the Lawsuit Unfolded
The 46-page complaint alleges that Salesforce trained its xGen AI models on the so-called Book3 corpus, a massive dataset containing hundreds of thousands of novels, essays, and literary works scraped from the internet, many of them under active copyright.
According to the plaintiffs, these texts were downloaded, stored, and copied in full, forming the linguistic backbone of xGen’s capabilities.
Such acts, they argue, violate the exclusive reproduction rights granted to authors under Section 106 of the U.S. Copyright Act, while giving Salesforce an enormous commercial advantage over creators who received nothing.
Adding to the controversy, the suit highlights public statements by Salesforce’s CEO Marc Benioff, who previously condemned other AI firms for using “stolen data.”
That rhetorical reversal adds a powerful emotional undercurrent and makes this case as much about corporate credibility as copyright law.
The Legal Heart: Fair Use vs. Copyright Protection
To many observers, Tanzer v. Salesforce feels like a sequel to Authors Guild v. Google, the 2015 landmark that allowed Google to digitize books for its search index under the doctrine of transformative fair use.
But the similarities stop there.
What is a copyright lawsuit involving AI?
A copyright lawsuit involving artificial intelligence occurs when creators allege that an AI system used their protected works, such as books, music, or images without permission during model training.
These cases test whether machine learning qualifies as fair use under U.S. law or constitutes unauthorized copying of original content.
Where Google displayed only brief, non-substitutive snippets, Salesforce’s AI training allegedly ingested entire books, creating machine-learning weights that could be used to generate new text in similar style or tone.
The authors claim this process erases the line between study and reproduction, turning human creativity into raw machine fuel.
Salesforce, for its part, is expected to argue that:
-
Model training is transformative, producing data representations, not creative copies.
-
The process doesn’t compete with the original market, satisfying the fourth fair-use factor.
-
Limiting AI training would stifle innovation across industries relying on machine learning.
Understanding what courts mean by “transformative” is key here. As explored in Transformative Fair Use Explained: How to Legally Reuse Works in U.S. Copyright Law, the doctrine allows some reuse, but only when new meaning, message, or purpose is added.
The question now is whether teaching a machine to imitate writing styles qualifies.
Recent rulings such as Court Rules AI Cannot Be Copyrighted: Landmark Ruling on Human Authorship also underscore that copyright demands human input. The Salesforce case now tests the reverse—whether AI can legally consume human works without infringing them.
3. Regulation, Ethics, and the Coming AI Accountability Era
This lawsuit lands amid a broader regulatory awakening. Legislators in Washington are drafting bills that would:
-
Require transparency in AI training datasets,
-
Create licensing frameworks for copyrighted material, and
-
Establish royalty systems compensating creators for data use.
The U.S. Copyright Office is simultaneously reviewing whether AI training qualifies as “reproduction,” potentially setting a new legal threshold for compliance.
If courts act before lawmakers do, Tanzer v. Salesforce could set de facto national policy dictating how AI companies license data in the years ahead.
Salesforce’s case also carries a strong ethical dimension. Benioff’s vocal support for ethical capitalism and responsible tech use may amplify scrutiny.
In an era when investors and consumers value authenticity, perceived hypocrisy in AI ethics could become a reputational liability far greater than the lawsuit’s financial risk.
A Landmark Test for the Future of AI and Copyright
The plaintiffs seek class certification covering thousands of authors whose works were allegedly used in Salesforce’s datasets.
If granted, the financial exposure could reach hundreds of millions of dollars.
Discovery will likely reveal how Salesforce sourced its training data and whether internal discussions acknowledged copyright risks.
Beyond Salesforce, this lawsuit tests whether AI model training equals copying under U.S. law. A plaintiff victory could force developers to license creative content, spawning a new ecosystem for AI data rights management.
Conversely, a Salesforce win might cement fair use as a shield for large-scale training, leaving creators sidelined from the digital economy built on their words.
This debate isn’t confined to literature. Similar disputes are unfolding across industries, including film and design as seen in Disney & Universal vs. Midjourney: Inside the AI Copyright Battle That Could Rewrite Hollywood Law.
Together, these cases mark a global turning point for how law defines creativity in the age of algorithms.
Final Thought
The Tanzer v. Salesforce case goes beyond legal arguments, it’s part of a larger conversation about what creativity means in the age of machines.
If the authors win, it could mark the start of a new era where writers, artists, and creators are finally recognized and compensated for the value their work brings to artificial intelligence.
If Salesforce prevails, it may set a precedent that blurs the line between inspiration and imitation, raising uncomfortable questions about who truly owns creative expression in a digital world.
Whatever the outcome, the decision will ripple far beyond Silicon Valley, shaping how society balances innovation, ownership, and the human voice within AI’s expanding reach.
People Also Ask (PAA)
What is the Salesforce AI copyright lawsuit about?
The case involves authors accusing Salesforce of using their copyrighted books without permission to train its xGen AI model. They claim this violates the U.S. Copyright Act and undermines creative ownership in the age of artificial intelligence.
Why are authors suing Salesforce?
Writers Molly Tanzer and Jennifer Gilmore filed a class action alleging that Salesforce’s AI learned from pirated or unlicensed works. Their lawsuit seeks damages and stronger legal protection for creative content used in AI training.
Is it legal to use copyrighted books to train AI models?
The legality depends on fair use — a doctrine that allows limited use of copyrighted material for transformative purposes. Courts must now decide whether teaching AI to generate new text counts as transformation or infringement.
What could happen if Salesforce loses the lawsuit?
If the authors prevail, Salesforce may face major financial penalties and be forced to license copyrighted data. The decision could also set a national precedent requiring all AI developers to pay for the creative works they use.
How could this case impact future AI laws?
A ruling against Salesforce could shape how lawmakers regulate data transparency and copyright licensing in AI development. It may redefine fair use, forcing companies to rethink how they train large language models.



















