Bayesian Modeling of Lexical Resources for Low-Resource Settings

Nicholas Andrews; Mark Dredze; Benjamin Van Durme; Jason Eisner

2017 ACL ACL 2017

Bayesian Modeling of Lexical Resources for Low-Resource Settings

Abstract

AbstractLexical resources such as dictionaries and gazetteers are often used as auxiliary data for tasks such as part-of-speech induction and named-entity recognition. However, discriminative training with lexical features requires annotated data to reliably estimate the lexical feature weights and may result in overfitting the lexical features at the expense of features which generalize better. In this paper, we investigate a more robust approach: we stipulate that the lexicon is the result of an assumed generative process. Practically, this means that we may treat the lexical resources as observations under the proposed generative model. The lexical resources provide training data for the generative model without requiring separate data to estimate lexical feature weights. We evaluate the proposed approach in two settings: part-of-speech induction and low-resource named-entity recognition.

🌱 Topic Pioneer — Named Entity Recognition

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

📈 Trend Setter — Named Entity Recognition

🧭 Keyword Pioneer — low-resource setting

🐣 Hot Topic Early Bird — named entity recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Nicholas Andrews , Mark Dredze , Benjamin Van Durme , Jason Eisner

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Natural Language Processing > Understanding > Named Entity Recognition Natural Language Processing > Understanding > Part-of-Speech Tagging Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Artificial Intelligence > Bayesian & Probabilistic > Bayesian Inference Natural Language Processing > Applications > Named Entity Recognition Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Learning Types > Generative Model

Keywords

bayesian inference named entity recognition part-of-speech induction bayesian modeling low-resource language generative model low-resource setting lexical resource

Download PDF

Related papers

A* CCG Parsing with a Supertag and Dependency Factored Model 2017

Detecting annotation noise in automatically labelled data 2017

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2017

Annotating tense, mood and voice for English, French and German 2017

Word Embedding for Response-To-Text Assessment of Evidence 2017