Learning Distributional Token Representations from Visual Features

Samuel Broscheit

2018 ACL ACL 2018

Learning Distributional Token Representations from Visual Features

Abstract

AbstractIn this study, we compare token representations constructed from visual features (i.e., pixels) with standard lookup-based embeddings. Our goal is to gain insight about the challenges of encoding a text representation from low-level features, e.g. from characters or pixels. We focus on Chinese, which—as a logographic language—has properties that make a representation via visual features challenging and interesting. To train and evaluate different models for the token representation, we chose the task of character-based neural machine translation (NMT) from Chinese to English. We found that a token representation computed only from visual features can achieve competitive results to lookup embeddings. However, we also show different strengths and weaknesses in the models’ performance in a part-of-speech tagging task and also a semantic similarity task. In summary, we show that it is possible to achieve a text representation only from pixels. We hope that this is a useful stepping stone for future studies that exclusively rely on visual input, or aim at exploiting visual features of written language.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — token representation

🐣 Hot Topic Early Bird — neural machine translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Samuel Broscheit

Topics

Machine Learning > Core Methods > Representation Learning Natural Language Processing > Understanding > Part-of-Speech Tagging Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Text Representation Natural Language Processing > Generation > Machine Translation Computer Vision > Core AI > Computer Vision Deep Learning > Learning Types > Representation Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

neural machine translation part-of-speech tagging semantic similarity visual feature character recognition token representation lookup embedding chinese language

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018