The Impact of Word Representations on Sequential Neural MWE Identification

Nicolas Zampieri; Carlos Ramisch; Geraldine Damnati

2019 ACL ACL 2019

The Impact of Word Representations on Sequential Neural MWE Identification

Abstract

AbstractRecent initiatives such as the PARSEME shared task allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based embeddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lemmas, depending on the morphological complexity of the language.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — word-based embedding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nicolas Zampieri , Carlos Ramisch , Geraldine Damnati

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Named Entity Recognition Deep Learning > Techniques > Representation Learning

Keywords

sequence labeling word representation character embedding multiword expression morphological complexity multi-word expression neural sequence model neural network character-based embedding word-based embedding

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019