Coding Textual Inputs Boosts the Accuracy of Neural Networks

Abdul Rafae Khan; Jia Xu; Weiwei Sun

2020 EMNLP EMNLP 2020

Coding Textual Inputs Boosts the Accuracy of Neural Networks

Abstract

AbstractNatural Language Processing (NLP) tasks are usually performed word by word on textual inputs. We can use arbitrary symbols to represent the linguistic meaning of a word and use these symbols as inputs. As “alternatives” to a text representation, we introduce Soundex, MetaPhone, NYSIIS, logogram to NLP, and develop fixed-output-length coding and its extension using Huffman coding. Each of those codings combines different character/digital sequences and constructs a new vocabulary based on codewords. We find that the integration of those codewords with text provides more reliable inputs to Neural-Network-based NLP systems through redundancy than text-alone inputs. Experiments demonstrate that our approach outperforms the state-of-the-art models on the application of machine translation, language modeling, and part-of-speech tagging. The source code is available at https://github.com/abdulrafae/coding_nmt.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — text coding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Abdul Rafae Khan , Jia Xu , Weiwei Sun

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Text Representation Deep Learning > Models > Neural Networks Deep Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Natural Language Processing Machine Learning > Learning Types > Feature Learning

Keywords

machine translation language modeling text representation part-of-speech tagging text encoding neural network text coding phonetic coding

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020