Nvidia NeMo Copyright Class Action Lawsuit, Judge Refuses to Dismiss Authors’ Claims Over 197,000 Pirated Books

U.S. District Judge Jon Tigar denied most of Nvidia’s motion to dismiss a proposed class action in the Northern District of California, allowing claims for direct and contributory copyright infringement to proceed. The court concluded that scripts Nvidia distributed to clients so they could automatically download and preprocess The Pile dataset had no purpose other than enabling infringement. No settlement exists. No claim form is open. This is an active AI copyright lawsuit moving into discovery.

Quick Facts: Nazemian v. Nvidia Corporation

FieldDetail
Case Name & NumberNazemian et al. v. NVIDIA Corporation, Case No. 4:24-cv-01454-JST
CourtU.S. District Court, Northern District of California, Oakland Division
JudgeU.S. District Judge Jon S. Tigar
FiledMarch 8, 2024
DefendantNvidia Corporation
Lead PlaintiffsAbdi Nazemian, Brian Keene, Stewart O’Nan, Andre Dubus III, Susan Orlean
Alleged ViolationDirect copyright infringement (17 U.S.C. § 501); Contributory copyright infringement
Books at IssueApproximately 196,640 books from the Books3/Bibliotik shadow library dataset
AI Models NamedNeMo Megatron (345M, GPT-3 10B, InstructRetro-48B, Retro-48B); Nemotron-4 15B
Current StatusMotion to dismiss denied (May 2026); discovery underway
Key Ruling DateMay 2026 (motion to dismiss denied)
Plaintiffs’ Law FirmJoseph Saveri Law Firm
Settlement / Claim FormNone — active litigation, no settlement reached
Last UpdatedMay 8, 2026

What Is the Nvidia NeMo Copyright Lawsuit About? Nazemian et al. v. NVIDIA Corporation, No. 4:24-cv-01454-JST

The authors say Nvidia trained several of its large language models on datasets and so-called shadow libraries — online repositories hosting pirated books and other copyrighted works — that contained their copyrighted books without permission.

A key focus of the lawsuit is a dataset known as “The Pile,” which included a subcollection of nearly 200,000 pirated books called Books3, itself sourced from the shadow library Bibliotik. The authors say Nvidia used The Pile to train multiple models in its Megatron line, including Megatron 345M, NeMo GPT-3 10B, InstructRetro-48B, Retro-48B and Nemotron-4 15B.

Here is how the chain works in plain terms. Bibliotik is a website that hosts pirated ebooks without authorization from the authors or publishers who own those books. Books3 is a dataset assembled from Bibliotik’s collection — approximately 196,640 books — that was made available through the AI research platform Hugging Face. The Pile is a larger dataset that incorporated Books3 as a sub-collection. Nvidia used The Pile to train its NeMo Megatron language models. The Books3 dataset was available from Hugging Face until October 2023, at which point it was removed with a message stating it was “defunct and no longer accessible due to reported copyright infringement.” Plaintiffs argue that Nvidia thereby admitted training its NeMo Megatron models in a way that directly infringes authors’ copyrights.

The case expanded significantly after filing. Five authors — Abdi Nazemian, Brian Keene, Stewart O’Nan, Andre Dubus III and Susan Orlean — filed an expanded class action alleging Nvidia used their copyrighted works without permission to train AI models including NeMo Megatron and Nemotron-4. The lawsuit also alleges Nvidia downloaded copyrighted material from other shadow libraries including LibGen, Sci-Hub, and Z-Library.

Internal records that emerged during discovery raised the stakes further. Internal documents show that one of the most direct reasons for Nvidia to obtain pirated books was competitive pressure in the AI industry. In September 2022, Nvidia released the NeMo Megatron series of large models. In the following year, ChatGPT launched by OpenAI was a great success, which increased investor attention to artificial intelligence. The annual Developer Conference in the fall of 2023 was considered an important milestone by Nvidia, and releasing a large language model with leading performance at that conference was seen as a way to cope with fierce competition.

Related article: Walter Loughney v. American Airlines, Florida Man Sues After Inflight Assault Left Him With Head and Brain Injuries

Nvidia NeMo Copyright Class Action Lawsuit, Judge Refuses to Dismiss Authors' Claims Over 197,000 Pirated Books

For anyone tracking how similar AI copyright cases have resolved, our earlier coverage of the AT&T data breach class action settlement on AllAboutLawyer.com illustrates how class actions of this scale move from filing through discovery to resolution — a process that can take years even when the core facts are not in dispute.

What the Judge Actually Ruled — and Why It Matters

This is the most legally significant development in the case so far, and it is worth understanding precisely what the judge did and did not decide.

Nvidia filed a motion to dismiss in January 2026, asking the court to throw out several of the claims before the case proceeds to full discovery and trial. Nvidia argued that Cox v. Sony, a recent U.S. Supreme Court ruling, had tightened the standard for contributory copyright infringement, requiring “active encouragement through specific acts.” Nvidia also stressed that the NeMo Megatron Framework as a whole had substantial non-infringing uses, and that marketing or promoting the framework as a piracy tool would be needed to prove the contributory claim.

Judge Tigar rejected that framing. Instead of analyzing the Megatron framework as a whole, he zeroed in on the specific scripts that Nvidia distributed to clients so they could automatically download and preprocess The Pile dataset. Those scripts have no purpose other than enabling infringement, the court concluded. “The scripts are alleged to have no other purpose than to speed up the process of infringement, unlike the digital video recorder systems at issue in Sony Corp. or the internet service provided in Cox,” Judge Tigar wrote.

On the question of whether Nvidia knew what its customers were doing with those tools, Tigar again sided with the authors. Their complaint did not rest on suspicion, but identified concrete instances of infringement by named customers, the judge found. “Plaintiffs have alleged that NVIDIA knew that its scripts and other assistance were directly contributing to infringement by third parties,” he wrote.

Nvidia also tried to get all references to BitTorrent stripped from the case. The court refused, with Judge Tigar noting that “BitTorrent is merely a tool, not a library or dataset.” He offered a pointed analogy: “Asking to dismiss allegations concerning BitTorrent is like asking to dismiss allegations concerning paintbrushes in a case about a dolphin painting.”

One claim that did not survive was vicarious copyright infringement, which requires showing the defendant had both the right to control the infringing conduct and a direct financial interest in it. Tigar found neither was adequately pleaded, but allowed the authors 21 days to address the deficiencies and refile.

This appears to be the first AI training case to apply the new Cox v. Sony standard, and the result did not go the way Nvidia hoped.

Are You Part of the Nvidia NeMo Copyright Class Action?

This is a consumer and creator copyright class action, not a data breach case. If you are an author whose published work may have been included in the Books3/Bibliotik dataset, you may be a potential class member.

You may be part of this class if you:

  • Are a U.S.-based author who holds a registered copyright in a published book
  • Your book was published in or before October 2023, when Books3 was taken down from Hugging Face
  • Your book was part of the Bibliotik ebook collection or the Books3 dataset — you can check by searching the Books3 dataset index, which has been made available through various academic and legal research sources
  • Your copyrighted work may have been used to train Nvidia’s NeMo Megatron, NeMo GPT-3, InstructRetro, Retro-48B, or Nemotron-4 models without your permission

You are likely NOT included if:

  • You are a reader or user of Nvidia’s AI products — this class action is brought by copyright holders, not consumers harmed as end users
  • Your works are not covered by U.S. copyright law or were not among the books digitized by Bibliotik

If you are an author who believes your work was included in Books3, you can contact the Joseph Saveri Law Firm, which represents the plaintiff class, through their website at saverilawfirm.com.

What Plaintiffs Are Seeking

This case is still in active litigation. No settlement has been proposed, negotiated, or approved. No claim form exists.

Plaintiffs are seeking statutory damages, actual damages, restitution of profits, and other remedies under the Copyright Act. They are also seeking an injunction to bar Nvidia from further use of the infringing material and to remove all instances of the works from its models.

Under the U.S. Copyright Act, statutory damages can reach $150,000 per work for willful infringement. With approximately 196,640 books at issue, potential statutory damages exposure — if the class prevails at trial and the court finds willful infringement — would be enormous. In practice, courts weigh many factors in setting final damages awards, and large copyright cases frequently settle well before trial.

Nvidia’s stated defense remains fair use — the argument that training AI models on copyrighted material is a transformative use that does not constitute infringement. Nvidia defends its actions as fair use under copyright law. The fair use defense was not addressed in the motion to dismiss ruling, meaning that fight remains ahead.

Nvidia NeMo Copyright Lawsuit Timeline

MilestoneDate
Books3 dataset removed from Hugging Face “due to reported copyright infringement”October 2023
Lawsuit filedMarch 8, 2024
Amended complaint filed expanding defendants and shadow librariesLate 2025
Nvidia files motion to dismissJanuary 31, 2026
Hearing on motion to dismissApril 2, 2026
Judge denies most of Nvidia’s motion to dismissMay 2026
Judge orders Nvidia to produce discovery on shadow library datasetsLate April 2026
Close of expert discovery (per scheduling order)June 26, 2026
Class certification motionTBD
Trial / SettlementTBD — no trial date set

Frequently Asked Questions

Is there a class action lawsuit against Nvidia over its AI training data? 

Yes. On March 8, 2024, the Joseph Saveri Law Firm filed a class action lawsuit on behalf of plaintiff and class-member authors who own registered copyrights in books that were included in the Books3 dataset that Nvidia used to train NeMo Megatron. The case is Nazemian et al. v. NVIDIA Corporation, Case No. 4:24-cv-01454-JST, in the U.S. District Court for the Northern District of California.

What is the Books3 dataset and why does it matter?

 Books3 is a subcollection of nearly 200,000 pirated books sourced from the shadow library Bibliotik. It was incorporated into a larger dataset called “The Pile,” which Nvidia used to train its NeMo Megatron language models. The dataset was removed from Hugging Face in October 2023 after reported copyright infringement — an act plaintiffs argue amounts to an admission by Nvidia.

What did the judge actually decide in May 2026?

 Judge Jon Tigar denied most of Nvidia’s motion to dismiss, allowing claims for direct and contributory copyright infringement to proceed. The court found that specific scripts Nvidia distributed to help customers download The Pile had no purpose other than enabling copyright infringement — a ruling that this case appears to be the first in AI training litigation to reach under the new Supreme Court Cox v. Sony standard.

What is Nvidia’s defense? 

Nvidia’s primary defense is fair use — the argument that using copyrighted works to train AI models is transformative and does not constitute infringement. That argument was not resolved by the motion to dismiss ruling and will be fought during discovery and, potentially, at summary judgment or trial. Nvidia also argued its NeMo Megatron Framework had legitimate non-infringing uses. The judge separated the framework as a whole from the specific scripts, rejecting the broader argument.

Can I file a claim right now?

 No. There is no settlement, no claim form, and no money available. If you are an author whose work may be in the Books3 dataset, contact the Joseph Saveri Law Firm at saverilawfirm.com for information about joining the class.

How does this case relate to other AI copyright lawsuits?

 In June 2025, the U.S. District Court for the Northern District of California ruled in a copyright case against Anthropic that using copyrighted works for AI training was fair use, but that downloading more than 7 million pirated ebooks from websites like Library Genesis essentially and irredeemably constitutes infringement and cannot be exempted by fair use. In September 2025, Anthropic agreed to pay at least $1.5 billion to settle that case. The Nvidia case follows a similar fact pattern but with different AI models and different shadow libraries. The outcome of Nvidia’s fair use defense at trial may be shaped by how courts ruled in those parallel cases.

Is Nvidia dragging its feet on discovery? 

In late April 2026, the presiding judge ordered Nvidia to produce discovery information regarding the shadow library datasets used to train the NeMo models within a month, finding that Nvidia’s slow pace had been unwarranted. The judge imposed a hard deadline for full production.

Sources & References

  • Court Complaint and Docket: Nazemian et al. v. NVIDIA Corporation, Case No. 4:24-cv-01454-JST, U.S. District Court for the Northern District of California — PACER.gov
  • TorrentFreak: “NVIDIA’s Shadow Library Scripts ‘Have No Other Purpose’ Than Infringement, Judge Rules,” May 2026
  • Courthouse News Service: “Nvidia Can’t Shake Authors’ Claims It Trained AI on Pirated Books,” May 2026
  • Joseph Saveri Law Firm: NVIDIA Large Language Model Litigation
  • U.S. Department of Justice, Middle District of Florida — United States v. Leo Govoni et al.: justice.gov/usao-mdfl/Govoni

Prepared by the AllAboutLawyer.com Editorial Team and reviewed for factual accuracy against court filings, TorrentFreak, Courthouse News Service, and the Joseph Saveri Law Firm’s case documentation on May 8, 2026. Last Updated: May 8, 2026

Disclaimer: This article is for informational purposes only and does not constitute legal advice. Nvidia denies all allegations of wrongdoing and maintains its actions constitute fair use under copyright law. All allegations described above remain unproven at trial. Legal claims and outcomes depend on specific facts and applicable law. For advice regarding a particular situation, consult a qualified intellectual property attorney.

About the Author

Sarah Klein, JD, is a licensed attorney and legal content strategist with over 12 years of experience across civil, criminal, family, and regulatory law. At All About Lawyer, she covers a wide range of legal topics — from high-profile lawsuits and courtroom stories to state traffic laws and everyday legal questions — all with a focus on accuracy, clarity, and public understanding.
Her writing blends real legal insight with plain-English explanations, helping readers stay informed and legally aware.
Read more about Sarah

Leave a Reply

Your email address will not be published. Required fields are marked *