Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Lloyd Elliott; Yee W. Teh

2012 NIPS NeurIPS 2012

Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Abstract

We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by their clusters first splitting and then merging. Our model can be thought of as a discrete time analogue of continuous time fragmentation-coagulation processes [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining the same accuracies as in [Teh et al 2011].

🌉 Interdisciplinary Bridge — Artificial Intelligence and Healthcare & Medicine and Machine Learning

🧭 Keyword Pioneer — genetic data

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing

Authors

Lloyd Elliott , Yee W. Teh

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Stochastic Processes Healthcare & Medicine > Research > Bioinformatics Machine Learning > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Core Methods > Probabilistic Modeling

Keywords

bayesian nonparametrics genotype imputation genetic data fragmentation coagulation markov model genetic sequence data bayesian nonparametric fragmentation-coagulation process genetic imputation genetic data imputation

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012