MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

Debtanu Datta; Shubham Soni; Rajdeep Mukherjee; Saptarshi Ghosh

2023 EMNLP EMNLP 2023

MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

Abstract

AbstractAutomatic summarization of legal case judgments is a practically important problem that has attracted substantial research efforts in many countries. In the context of the Indian judiciary, there is an additional complexity – Indian legal case judgments are mostly written in complex English, but a significant portion of India’s population lacks command of the English language. Hence, it is crucial to summarize the legal documents in Indian languages to ensure equitable access to justice. While prior research primarily focuses on summarizing legal case judgments in their source languages, this study presents a pioneering effort toward cross-lingual summarization of English legal documents into Hindi, the most frequently spoken Indian language. We construct the first high-quality legal corpus comprising of 3,122 case judgments from prominent Indian courts in English, along with their summaries in both English and Hindi, drafted by legal practitioners. We benchmark the performance of several diverse summarization approaches on our corpus and demonstrate the need for further research in cross-lingual summarization in the legal domain.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — legal case judgment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Debtanu Datta , Shubham Soni , Rajdeep Mukherjee , Saptarshi Ghosh

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Generation > Summarization Natural Language Processing > Resources & Methods > Multilingual NLP Deep Learning > Learning Types > Generative Models Natural Language Processing > Applications > Natural Language Generation

Keywords

multilingual nlp natural language generation text summarization multilingual summarization cross-lingual summarization legal summarization legal case judgment

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023