2026 AAAI AAAI 2026

An Invariant Latent Space Perspective on Language Model Inversion

Abstract

Abstract Language model inversion (LMI), i.e., recovering hidden prompts from outputs, emerges as a concrete threat to user privacy and system security. We recast LMI as reusing the LLM's own latent space and propose the Invariant Latent Space Hypothesis (ILSH): (1) diverse outputs from the same source prompt should preserve consistent semantics (source invariance), and (2) input<->output cyclic mappings should be self-consistent within a shared latent space (cyclic invariance). Accordingly, we present Inv2A, which treats the LLM as an invariant decoder and learns only a lightweight inverse encoder that maps outputs to a denoised pseudo-representation. When multiple outputs are available, they are sparsely concatenated at the representation layer to increase information density. Training proceeds in two stages: contrastive alignment (source invariance) and supervised reinforcement (cyclic invariance). An optional training-free neighborhood search can refine local performance. Across 9 datasets covering user and system prompt scenarios, Inv2A outperforms baselines by an average of 4.77% BLEU score while reducing dependence on large inverse corpora. Our analysis further shows that prevalent defenses provide limited protection, underscoring the need for stronger strategies.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio