← Models

Deep Learning › Models ›

Large Language Models

2678 directly classified papers

Papers per year

Papers

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding CVPR 2025

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant CVPR 2025

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models CVPR 2025

Assessing French Readability for Adults with Low Literacy: A Global and Local Perspective EMNLP 2025

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations CVPR 2025

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles EMNLP 2025

Adaptively profiling models with task elicitation EMNLP 2025

Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information ICCV 2025

DASH: Detection and Assessment of Systematic Hallucinations of VLMs ICCV 2025

GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression CVPR 2025

Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025

Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding CVPR 2025

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation CVPR 2025

Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis ICCV 2025

MLLM-as-a-Judge for Image Safety without Human Labeling CVPR 2025

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization CVPR 2025

Streaming VideoLLMs for Real-Time Procedural Video Understanding ICCV 2025

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models CVPR 2025

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models ICCV 2025

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models ICCV 2025

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh ICCV 2025

FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance ICCV 2025

ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models ICCV 2025