2012
NIPS
NeurIPS 2012
Scalable imputation of genetic data with a discrete fragmentation-coagulation process
Abstract
We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by their clusters first splitting and then merging. Our model can be thought of as a discrete time analogue of continuous time fragmentation-coagulation processes [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining the same accuracies as in [Teh et al 2011].
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Healthcare & Medicine and Machine Learning
🧭
Keyword Pioneer
— genetic data
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing
Authors
Topics
Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning
Machine Learning > Optimization & Theory > Stochastic Processes
Healthcare & Medicine > Research > Bioinformatics
Machine Learning > Bayesian & Probabilistic > Bayesian Learning
Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling
Machine Learning > Core Methods > Probabilistic Modeling