Table of Contents
Quick Answer
AI copyright law in 2026 is being shaped by hundreds of pending lawsuits worldwide. Training on copyrighted works without licence is permitted by narrow exceptions (US fair use, EU Text and Data Mining, UK Section 29A) but outputs substantially similar to training works remain infringing.
- Training is NOT automatically infringement — and it is NOT automatically fair use
- Generative outputs can infringe if substantially similar to protected works
- The US Copyright Office holds that purely AI-generated works lack human authorship
What Is the AI Copyright Landscape?
Copyright questions in AI span three stages:
- Input (training data collection and use)
- Model (can a trained model itself infringe?)
- Output (is a generated work a derivative?)
Key authorities are the US Copyright Office (Reports on Copyright and AI, Part 1 March 2024, Part 2 January 2025, Part 3 May 2025), the UK IPO, the EU Copyright Directive Articles 3 and 4 (TDM exceptions), and Japan's Article 30-4 of the Copyright Act.
Key Details / Requirements
Major Pending Lawsuits (Selected)
Case
Plaintiffs
Defendants
Filed
Core Issue
New York Times v. OpenAI & Microsoft
NYT
OpenAI, Microsoft
Dec 2023
Training and verbatim memorisation
Andersen v. Stability AI
Artists
Stability AI
2023
Training on artworks
Getty Images v. Stability AI (US + UK)
Getty
Stability AI
2023
Training on Getty library
Authors Guild v. OpenAI
Authors
OpenAI
2023
Novels in training data
Concord Music v. Anthropic
Publishers
Anthropic
2023
Song lyrics
Bartz v. Anthropic
Authors
Anthropic
2024
Books in training (settled September 2025 for USD 1.5B)
Global TDM and Fair-Use Regimes
Jurisdiction
Rule
Opt-Out Allowed?
USA
Fair use (17 USC 107)
N/A
EU
Copyright Directive Art. 3 (research) and Art. 4 (commercial)
Yes for Art. 4 via machine-readable opt-out
UK
Sec 29A CDPA (non-commercial TDM only)
N/A
Japan
Art. 30-4 Copyright Act (non-enjoyment exception)
No
Singapore
Computational Data Analysis (Sec 244 Copyright Act 2021)
No
Real-World Examples / Case Studies
Bartz v. Anthropic (2025) — The first major AI training settlement: USD 1.5 billion class-action settlement over books used in training, though Judge Alsup had ruled earlier that training itself was transformative fair use when done on lawfully acquired copies.
New York Times v. OpenAI (ongoing) — Federal complaint alleges GPT-4 reproduces Times articles verbatim and competes with the Times' own business.
Stability AI (UK) — Getty Images High Court trial concluded in 2025 with a partial win for Getty on trademark grounds.
US Copyright Office Zarya of the Dawn (2023) — Comic authored by Kris Kashtanova; text and arrangement protected, but Midjourney-generated images denied registration.
What This Means for AI Teams
In 2026, AI teams must:
- License training data whenever practical (Getty, Shutterstock, Reuters have all signed licensing deals)
- Implement training-data provenance records (per EU AI Act Art. 53(1)(c))
- Respect robots.txt signals and TDM opt-outs (EU Copyright Directive)
- Add output filters for memorisation and near-duplicate generation
- Indemnify customers against third-party copyright claims (as Adobe, Microsoft, Google, OpenAI now do for enterprise customers)
Compliance Checklist
- Publish a training-data sources document
- Honour machine-readable opt-outs (robots.txt, TDM Reservation Protocol, C2PA)
- License copyrighted datasets where feasible
- Build memorisation tests into evaluation pipelines
- Offer customer IP indemnification where commercially appropriate
- For deployers: record prompts and outputs to demonstrate non-infringement
- Track ongoing cases and US Copyright Office guidance
FAQs
Q: Is AI training automatically fair use?
No — each case is fact-specific. Several US courts have found training on lawfully acquired data transformative, but not all.
Q: Can AI-generated images be copyrighted?
Only if a human author contributes sufficient creative expression. Pure AI output is not protectable under US Copyright Office policy.
Q: What is the TDM opt-out?
EU Copyright Directive Article 4 allows commercial TDM unless rightholders reserve their rights in a machine-readable form.
Q: Does robots.txt count as a TDM opt-out?
Yes — Cloudflare's AI Crawl Control and W3C signals have now standardised this.
Q: Are lyrics protected differently?
Yes — musical compositions have separate copyright from recordings; lyrics are literary works.
Q: What penalties apply?
US: up to USD 150,000 per work for wilful infringement. EU: member-state-specific.
Q: Can I train on public-domain data only?
Yes — but quality and coverage are usually insufficient for state-of-the-art models.
Conclusion
AI copyright is the most unsettled area of AI law. Teams that document provenance, license data, and indemnify customers will weather the lawsuits best.
Audit your training data with Misar AI's copyright provenance toolkit.