2024
EMNLP
EMNLP 2024
LawBench: Benchmarking Legal Knowledge of Large Language Models
Abstract
AbstractWe present LawBench, the first evaluation benchmark composed of 20 tasks aimed to assess the ability of Large Language Models (LLMs) to perform Chinese legal-related tasks. LawBench is meticulously crafted to enable precise assessment of LLMs’ legal capabilities from three cognitive levels that correspond to the widely accepted Bloom’s cognitive taxonomy. Using LawBench, we present a comprehensive evaluation of 21 popular LLMs and the first comparative analysis of the empirical results in order to reveal their relative strengths and weaknesses. All data, model predictions and evaluation code are accessible from https://github.com/open-compass/LawBench.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— cognitive level
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Zhiwei Fei
,
Xiaoyu Shen
,
Dawei Zhu
,
Fengzhe Zhou
,
Zhuo Han
,
Alan Huang
,
Songyang Zhang
,
Kai Chen
,
Zhixin Yin
,
Zongwen Shen
,
Jidong Ge
,
Vincent Ng
Topics
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Large Language Models
Artificial Intelligence > Core AI > Large Language Models
Natural Language Processing > Applications > Natural Language Inference
Machine Learning > Optimization & Theory > Evaluation
Machine Learning > Learning Types > Evaluation
Deep Learning > Optimization & Theory > Evaluation