Artificial Intelligence › Core AI ›

Large Language Models

6405 directly classified papers

Papers per year

Papers

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025

PerLA: Perceptive 3D Language Assistant CVPR 2025

CoMMIT: Coordinated Multimodal Instruction Tuning EMNLP 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression CVPR 2025

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning ICCV 2025

HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction CVPR 2025

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs ICCV 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding CVPR 2025

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding CVPR 2025

ICP: Immediate Compensation Pruning for Mid-to-high Sparsity CVPR 2025

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception CVPR 2025

ERFSL: An Efficient Reward Function Searcher via Large Language Models for Custom-Environment Multi-Objective Reinforcement Learning (Student Abstract) AAAI 2025

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark CVPR 2025

Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis ICCV 2025

Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition ICCV 2025

CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation CVPR 2025

MiDSummer: Multi-Guidance Diffusion for Controllable Zero-Shot Immersive Gaussian Splatting Scene Generation ICCV 2025

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research CVPR 2025

Causality-guided Prompt Learning for Vision-language Models via Visual Granulation ICCV 2025

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer CVPR 2025

Breaking the Encoder Barrier for Seamless Video-Language Understanding ICCV 2025

DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery CVPR 2025

TTD-SQL: Tree-Guided Token Decoding for Efficient and Schema-Aware SQL Generation EMNLP 2025

Vision-Language Model IP Protection via Prompt-based Learning CVPR 2025