Word-Conditioned 3D American Sign Language Motion Generation

Lu Dong; Xiao Wang; Ifeoma Nwogu

2024 EMNLP EMNLP 2024

Word-Conditioned 3D American Sign Language Motion Generation

Abstract

AbstractSign words are the building blocks of any sign language. In this work, we present wSignGen, a word-conditioned 3D American Sign Language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate motion sequences for sign words. Our approach leverages a transformer-based diffusion model, trained on a curated dataset of 3D motion meshes from word-level ASL videos. By integrating CLIP, wSignGen offers two advantages: image-based generation, which is particularly useful for children learning sign language but not yet able to read, and the ability to generalize to unseen synonyms. Experiments demonstrate that wSignGen significantly outperforms the baseline model in the task of sign word generation. Moreover, human evaluation experiments show that wSignGen can generate high-quality, grammatically correct ASL signs effectively conveyed through 3D avatars.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Speech & Audio

🧭 Keyword Pioneer — word-conditioned generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lu Dong , Xiao Wang , Ifeoma Nwogu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Transformers Deep Learning > Models > Diffusion Models Computer Vision > Generation > Video Generation Deep Learning > Learning Types > Generative Models Computer Vision > Generation > 3D Generation Speech & Audio > Synthesis > Speech Synthesis

Keywords

motion generation diffusion model 3d generation sign language sign language generation 3d motion generation avatar animation american sign language sign language synthesis transformer-based diffusion word-conditioned generation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024