Nine Ways to Break Copyright Law and Why Our LLM Won’t: A Fair Use Aligned Generation Framework

Aakash Sen Sharma; Debdeep Sanyal; Priyansh Srivastava; Sundar Athreya H; Shirish Karande; Mohan Kankanhalli; Murari Mandal

2025 EMNLP EMNLP 2025

Nine Ways to Break Copyright Law and Why Our LLM Won’t: A Fair Use Aligned Generation Framework

Abstract

AbstractLarge language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated closely with intellectual property experts to develop LAW-LM (Legally Aware Language Model), a legally-grounded framework explicitly designed to align LLM outputs with fair-use doctrine. Central to our method is FairUseDB, a carefully constructed dataset containing 18,000 expert-validated examples covering nine realistic infringement scenarios. Leveraging this dataset, we apply Direct Preference Optimization (DPO) to fine-tune open-source LLMs, encouraging them to produce legally compliant and practically useful alternatives rather than resorting to blunt refusal. Recognizing the shortcomings of traditional evaluation metrics, we propose new measures: Weighted Penalty Utility and Compliance Aware Harmonic Mean (CAH) to balance infringement risk against response utility. Extensive quantitative experiments coupled with expert evaluations confirm that LAW-LM substantially reduces problematic outputs compared to state-of-the-art approaches, while preserving real-world usability.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aakash Sen Sharma , Debdeep Sanyal , Priyansh Srivastava , Sundar Athreya H , Shirish Karande , Mohan Kankanhalli , Murari Mandal

Topics

Artificial Intelligence > Core AI > Responsible AI Natural Language Processing > Resources & Methods > Large Language Models

Keywords

direct preference optimization model fine-tuning fair use large language model copyright law

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025