LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

Santosh T.y.s.s; Cornelius Weiss; Matthias Grabmair

2024 EMNLP EMNLP 2024

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

Abstract

AbstractIn the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress. However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking generative tasks. This work curates LexSumm, a benchmark designed for evaluating legal summarization tasks in English. It comprises eight English legal summarization datasets, from diverse jurisdictions, such as the US, UK, EU and India. Additionally, we release LexT5, legal oriented sequence-to-sequence model, addressing the limitation of the existing BERT-style encoder-only models in the legal domain. We assess its capabilities through zero-shot probing on LegalLAMA and fine-tuning on LexSumm. Our analysis reveals abstraction and faithfulness errors even in summaries generated by zero-shot LLMs, indicating opportunities for further improvements. LexSumm benchmark and LexT5 model are available at https://github.com/TUMLegalTech/LexSumm-LexT5.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Santosh T.y.s.s , Cornelius Weiss , Matthias Grabmair

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Learning Paradigms > Few-Shot Learning Machine Learning > Learning Types > Zero-Shot Learning Machine Learning > Application Areas > Data Augmentation Deep Learning > Architectures > Transformers Natural Language Processing > Generation > Summarization Natural Language Processing > Applications > Summarization Deep Learning > Models > Transformers Artificial Intelligence > Core AI > Natural Language Processing

Keywords

zero-shot learning text generation text summarization sequence-to-sequence model legal natural language processing legal summarization zero-shot probing

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024