2019
NAACL
NAACL 2019
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Abstract
AbstractWe investigate subword information for Chinese word segmentation, by integrating sub word embeddings trained using byte-pair encoding into a Lattice LSTM (LaLSTM) network over a character sequence. Experiments on standard benchmark show that subword information brings significant gains over strong character-based segmentation models. To our knowledge, this is the first research on the effectiveness of subwords on neural word segmentation.
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio