← Models

Deep Learning › Models ›

Large Language Models

2678 directly classified papers

Papers per year

Papers

AutoPresent: Designing Structured Visuals from Scratch CVPR 2025

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models CVPR 2025

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing CVPR 2025

Empowering LLMs to Understand and Generate Complex Vector Graphics CVPR 2025

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models CVPR 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training CVPR 2025

Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding NAACL 2025

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly CVPR 2025

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation CVPR 2025

Distilled Prompt Learning for Incomplete Multimodal Survival Prediction CVPR 2025

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model CVPR 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach CVPR 2025

RITT: A Retrieval-Assisted Framework with Image and Text Table Representations for Table Question Answering ACL 2025

PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models CVPR 2025

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation CVPR 2025

The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers CVPR 2025

Biodiversity ambition analysis with Large Language Models ACL 2025

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding CVPR 2025

Curse of bilinguality: Evaluating monolingual and bilingual language models on Chinese linguistic benchmarks ACL 2025

Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs CVPR 2025

Yo'Chameleon: Personalized Vision and Language Generation CVPR 2025

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding CVPR 2025

TriLLaMa at CQs-Gen 2025: A Two-Stage LLM-Based System for Critical Question Generation ACL 2025

DrVideo: Document Retrieval Based Long Video Understanding CVPR 2025

LLM-Based Explicit Models of Opponents for Multi-Agent Games NAACL 2025