Papers
18,748 papers found
Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?
Zexi Li, Xiangzhu Wang, William F. Shen et al.
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation Under the One-Time-Pad-Based Framework
Zi Liang, Liantong Yu, Zhang Shiyu et al.
On the Alignment of Large Language Models with Global Human Opinion
Yang Liu, Masahiro Kaneko, Chenhui Chu
DarkBench+: An Extended Benchmark for Evaluating Dark Patterns in Large Language Models
Yaowen Liu, Shenjia Jing, Yufei Wei et al.
LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models
Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck et al.
Beyond I’m Sorry, I Can’t: Dissecting Large-Language-Model Refusal
Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah et al.
Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models
Wei Qian, Chenxu Zhao, Yangyi Li et al.
Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models
Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Ye Wang et al.
Polarity-Aware Probing for Quantifying Latent Alignment in Language Models
Sabrina Sadiekh, Elena Ericheva, Chirag Agarwal
EASE: Practical and Efficient Safety Alignment for Small Language Models
Haonan Shi, Guoli Wang, Tu Ouyang et al.
Beyond Verdicts: Evaluating Language Model Moral Competence
Aaron J Snoswell, Daniel Kilov, Seth Lazar
Safe Multi-agent Reinforcement Learning with Natural Language Constraints
Ziyan Wang, Meng Fang, Tristan Tomilin et al.
Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
Yijun Yang, Lichao Wang, Jianping Zhang et al.
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
Yitong Zhang, Jia Li, Liyi Cai et al.
CultureRL: Internalizing Cultural Principles in Large Language Models via Norm-Driven Reinforcement Learning
Weixiang Zhao, Haozhen Li, Yanyan Zhao et al.
Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models
Tianyi Zhou, Johanne Medina, Sanjay Chawla
Fine-Grained Interpretation of Political Opinions in Large Language Models
Jingyu Hu, Mengyue Yang, Mengnan Du et al.
Satellite-Text-Prompted Large Language Model for Photovoltaic Power Forecasting
Pengfei Jia, Jianghong Ma, Baoquan Zhang et al.
A Human-Centric Pipeline for Aligning Large Language Models with Chinese Medical Ethics
Haoan Jin, Han Ying, Jiacheng Ji et al.
Language Models and Logic Programs for Trustworthy Tax Reasoning
William Jurayj, Nils Holzenberger, Benjamin Van Durme
CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation
Chenchen Kuai, Chenhao Wu, Yang Zhou et al.
How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets
Chung Peng Lee, Rachel Hong, Harry H. Jiang et al.
Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents
Hao Li, Haotian Chen, Ruoyuan Gong et al.
MHB: Medical Hallucination Benchmark for Large Language Models in Complex Clinical Tasks
Jianrong Lu, Junwei Liu, Xingyun Zheng et al.
Rescind: Countering Image Misconduct in Biomedical Publications with Vision-Language and State-Space Modeling
Soumyaroop Nandi, Prem Natarajan