SELENE: Selective and Evidence-Weighted LLM Debating for Efficient and Reliable Reasoning

Akshay Verma; Swapnil Gupta; Deepak Gupta; Prateek Sircar; Siddharth Pillai

2026 EACL EACL 2026

SELENE: Selective and Evidence-Weighted LLM Debating for Efficient and Reliable Reasoning

Abstract

AbstractMulti-Agent Debate (MAD) frameworks improve factual reliability in large language models (LLMs) by allowing agents to critiqueand refine one another’s reasoning. Yet, existing MAD systems are computationally expensive and prone to degradation under pro-longed debates due to redundant exchanges and unstable judging. We propose a lightweight,industry-deployable alternative that unifies Selective Debate Initiation (SDI) with Evidence Weighted Self-Consistency (EWSC) for adaptive, debate-on-demand reasoning. SDI dynamically predicts when debate is necessary by detecting confidence-likelihood misalignment and semantic disagreement, skippingwell-aligned queries to conserve computation. EWSC replaces a single-judge verdict with a variance-aware, evidence-weighted aggregation across paraphrased evaluations, yielding more stable factual judgments. Combined, SDI and EWSC reduce token consumption by nearly 50% while improving both accuracy and calibration. Evaluated on BoolQ, CosmosQA, and an internal QnA benchmark, our framework achieves higher factual robustness and efficiency, demonstrating that scalable, epistemically reliable multi-agent reasoning is practical for real-world LLM deployments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — selective debate

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning

Authors

Akshay Verma , Swapnil Gupta , Deepak Gupta , Prateek Sircar , Siddharth Pillai

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Artificial Intelligence > Core AI > Planning Machine Learning > Optimization & Theory > Optimization

Keywords

multi-agent debate factual reliability selective debate

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026