Generating Spatial Knowledge Graphs from Automotive Diagrams for Question Answering

Steve Bakos; Chen Xing; Heidar Davoudi; Aijun An; Ron DiCarlantonio

2025 EMNLP EMNLP 2025

Generating Spatial Knowledge Graphs from Automotive Diagrams for Question Answering

Abstract

AbstractAnswering “Where is the X button?” with “It’s next to the Y button” is unhelpful if the user knows neither location. Useful answers require obvious landmarks as a reference point. We address this by generating from a vehicle dashboard diagram a spatial knowledge graph (SKG) that shows the spatial relationship between a dashboard component and its nearby landmarks and using the SKG to help answer questions. We evaluate three distinct generation pipelines (Per-Attribute, Per-Component, and a Single-Shot baseline) to create the SKG using Large Vision-Language Models (LVLMs). On a new 65-vehicle dataset, we demonstrate that a decomposed Per-Component pipeline is the most effective strategy for generating a high-quality SKG; the graph produced by this method, when evaluated with a novel Significance score, identifies landmarks achieving 71.3% agreement with human annotators. This work enables downstream QA systems to provide more intuitive, landmark-based answers.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Knowledge & Reasoning and Natural Language Processing

🧭 Keyword Pioneer — spatial knowledge graph

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Steve Bakos , Chen Xing , Heidar Davoudi , Aijun An , Ron DiCarlantonio

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Knowledge Editing Knowledge & Reasoning > Representation > Knowledge Graphs Artificial Intelligence > Core AI > Large Language Models Computer Vision > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Computer Vision Artificial Intelligence > Core AI > Knowledge Graph Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Generation > Visual Question Answering

Keywords

visual question answering question answering graph generation landmark detection knowledge graph vision language model vision-language model spatial reasoning diagram understanding spatial knowledge graph automotive diagram knowledge graph generation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025