Papers
6,952 papers found
A Practical Method for Generating String Counterfactuals
Matan Avitan, Ryan Cotterell, Yoav Goldberg et al.
A Preliminary Study on NLP-Based Personalized Support for Type 1 Diabetes Management
Sandra Mitrović, Federico Fontana, Andrea Zignoli et al.
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou, Yang Zhang, Jacob Andreas et al.
Arabic Dataset for LLM Safeguard Evaluation
Yasser Ashraf, Yuxia Wang, Bin Gu et al.
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
Peiqin Lin, Andre Martins, Hinrich Schuetze
Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages
Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya
Are Larger Language Models Better at Disambiguation?
Ziyuan Cao, William Schuler
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee, Yerin Hwang, Yongil Kim et al.
Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning
Yilun Zhao, Guo Gan, Chengye Wang et al.
Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?
Neelabh Sinha, Vinija Jain, Aman Chadha
Are We Done with MMLU?
Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong et al.
Argumentation in political empowerment on Instagram
Aenne Knierim, Ulrich Heid
ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification
Yaswanth M, Vaibhav Singh, Ayush Maheshwari et al.
Artificial Relationships in Fiction: A Dataset for Advancing NLP in Literary Domains
Despina Christou, Grigorios Tsoumakas
ARWI: Arabic Write and Improve
Kirill Chirkunov, Bashar Alhafni, Chatrine Qwaider et al.
As easy as PIE: understanding when pruning causes language models to disagree
Pietro Tropeano, Maria Maistro, Tuukka Ruotsalo et al.
A Sentence-Level Visualization of Attention in Large Language Models
Seongbum Seo, Sangbong Yoo, Hyelim Lee et al.
Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversation
Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon et al.
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani et al.
AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation
Vaishnavi Pulavarthi, Deeksha Nandal, Soham Dan et al.
Assessing Crowdsourced Annotations with LLMs: Linguistic Certainty as a Proxy for Trustworthiness
Tianyi Li, Divya Sree, Tatiana Ringenberg
Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing
Hadi Askari, Anshuman Chhabra, Muhao Chen et al.
Assessing the Reliability and Validity of GPT-4 in Annotating Emotion Appraisal Ratings
Deniss Ruder, Andero Uusberg, Kairit Sirts
Assessing the State of the Art in Scene Segmentation
Albin Zehe, Elisabeth Fischer, Andreas Hotho