2025
ACL
ACL 2025
PL-Guard: Benchmarking Language Model Safety for Polish
Abstract
AbstractWe present a benchmark dataset for evaluating language model safety in Polish, addressing the underrepresentation of medium-resource languages in existing safety assessments. Our dataset includes both original and adversarially perturbed examples. We fine-tune and evaluate multiple models—LlamaGuard-3-8B, a HerBERT-based classifier, and PLLuM—and find that the HerBERT-based model outperforms others, especially under adversarial conditions.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > AI Safety
Artificial Intelligence > Core AI > Responsible AI
Machine Learning > Application Areas > Fairness
Natural Language Processing > Applications > Text Classification
Interdisciplinary > Linguistics > Computational Linguistics
Artificial Intelligence > Core AI > Large Language Models
Natural Language Processing > Resources & Methods > Language Modeling
Artificial Intelligence > Core AI > Safety