2022
COLING
COLING 2022
MaxMatch-Dropout: Subword Regularization for WordPiece
Abstract
AbstractWe present a subword regularization method for WordPiece, which uses a maximum matching algorithm for tokenization. The proposed method, MaxMatch-Dropout, randomly drops words in a search using the maximum matching algorithm. It realizes finetuning with subword regularization for popular pretrained language models such as BERT-base. The experimental results demonstrate that MaxMatch-Dropout improves the performance of text classification and machine translation tasks as well as other subword regularization methods. Moreover, we provide a comparative analysis of subword regularization methods: subword regularization with SentencePiece (Unigram), BPE-Dropout, and MaxMatch-Dropout.
🌉
Interdisciplinary Bridge
— Computer Science and Deep Learning and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio
Authors
Topics
Deep Learning > Techniques > Pretraining
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Resources & Methods > Text Representation
Computer Science > Foundations > Algorithms
Machine Learning > Learning Types > Supervised Learning
Natural Language Processing > Resources & Methods > Language Modeling