LawBench: Benchmarking Legal Knowledge of Large Language Models

Zhiwei Fei; Xiaoyu Shen; Dawei Zhu; Fengzhe Zhou; Zhuo Han; Alan Huang; Songyang Zhang; Kai Chen; Zhixin Yin; Zongwen Shen; Jidong Ge; Vincent Ng

2024 EMNLP EMNLP 2024

LawBench: Benchmarking Legal Knowledge of Large Language Models

Abstract

AbstractWe present LawBench, the first evaluation benchmark composed of 20 tasks aimed to assess the ability of Large Language Models (LLMs) to perform Chinese legal-related tasks. LawBench is meticulously crafted to enable precise assessment of LLMs’ legal capabilities from three cognitive levels that correspond to the widely accepted Bloom’s cognitive taxonomy. Using LawBench, we present a comprehensive evaluation of 21 popular LLMs and the first comparative analysis of the empirical results in order to reveal their relative strengths and weaknesses. All data, model predictions and evaluation code are accessible from https://github.com/open-compass/LawBench.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — cognitive level

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhiwei Fei , Xiaoyu Shen , Dawei Zhu , Fengzhe Zhou , Zhuo Han , Alan Huang , Songyang Zhang , Kai Chen , Zhixin Yin , Zongwen Shen , Jidong Ge , Vincent Ng

Topics

Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Natural Language Inference Machine Learning > Optimization & Theory > Evaluation Machine Learning > Learning Types > Evaluation Deep Learning > Optimization & Theory > Evaluation

Keywords

benchmark evaluation evaluation benchmark legal reasoning task evaluation legal text classification cognitive level bloom taxonomy cognitive task chinese legal large language model legal knowledge cognitive taxonomy

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024