← Domain-Specific

Computer Vision › Domain-Specific ›

Egocentric Vision

436 directly classified papers

Papers per year

Papers

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World CVPR 2024

Expressive Gaussian Human Avatars from Monocular RGB Video NIPS 2024

IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting WACV 2024

HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model NIPS 2024

3D Human Pose Perception from Egocentric Stereo Videos CVPR 2024

Instance Tracking in 3D Scenes from Egocentric Videos CVPR 2024

CosmicMan: A Text-to-Image Foundation Model for Humans CVPR 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos CVPR 2024

Egocentric Action Recognition by Capturing Hand-Object Contact and Object State WACV 2024

Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars NIPS 2024

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives CVPR 2024

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging CVPR 2024

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data NIPS 2024

Retrieval-Augmented Egocentric Video Captioning CVPR 2024

Towards a Dynamic Vision Sensor-Based Insect Camera Trap WACV 2024

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation CVPR 2024

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations CVPR 2024

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception CVPR 2024

Authentic Hand Avatar from a Phone Scan via Universal Hand Model CVPR 2024

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation CVPR 2024

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos CVPR 2024

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding EMNLP 2024

Error Detection in Egocentric Procedural Task Videos CVPR 2024

MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production ACL 2024