Multi-Perspective LSTM for Joint Visual Representation Learning

Alireza Sepas-Moghaddam; Fernando Pereira; Paulo Lobato Correia; Ali Etemad

2021 CVPR CVPR 2021

Multi-Perspective LSTM for Joint Visual Representation Learning

Abstract

We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in terms of recognition accuracy and complexity. We make our code publicly available at: https://github.com/arsm/MPLSTM

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alireza Sepas-Moghaddam , Fernando Pereira , Paulo Lobato Correia , Ali Etemad

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Face Recognition Computer Vision > Core AI > Computer Vision Deep Learning > Learning Types > Representation Learning Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Architectures > Recurrent Neural Networks

Keywords

representation learning face recognition visual representation joint learning multi-view learning visual representation learning long short-term memory recurrent neural network multi-perspective learning lip reading

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021