2020
EMNLP
EMNLP 2020
Imputing typological values via phylogenetic inference
Abstract
AbstractThis paper describes a workflow to impute missing values in a typological database, a sub- set of the World Atlas of Language Structures (WALS). Using a world-wide phylogeny de- rived from lexical data, the model assumes a phylogenetic continuous time Markov chain governing the evolution of typological val- ues. Data imputation is performed via a Max- imum Likelihood estimation on the basis of this model. As back-off model for languages whose phylogenetic position is unknown, a k- nearest neighbor classification based on geo- graphic distance is performed.
🌉
Interdisciplinary Bridge
— Interdisciplinary and Machine Learning and Mathematics & Optimization
🧭
Keyword Pioneer
— typological value
🐣
Hot Topic Early Bird
— k-nearest neighbor
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Classification
Machine Learning > Optimization & Theory > Statistical Learning
Mathematics & Optimization > Optimization > Stochastic Methods
Interdisciplinary > Linguistics
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling
Mathematics & Optimization > Probability > Stochastic Processes