Bayesian allocation model: marginal likelihood-based model selection for count tensors

Warning The system is temporarily closed to updates for reporting purpose.

Yıldırım, Sinan and Kurutmaz, M. Burak and Barsbey, Melih and Şimşekli, Umut and Cemgil, A. Taylan (2021) Bayesian allocation model: marginal likelihood-based model selection for count tensors. IEEE Journal of Selected Topics in Signal Processing, 15 (3). pp. 560-573. ISSN 1932-4553 (Print) 1941-0484 (Online)

[img]PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: http://dx.doi.org/10.1109/JSTSP.2020.3045297


In this article, we introduce a dynamic generative model, the Bayesian allocation model (BAM), for modeling count data. BAM covers various probabilistic nonnegative tensor factorization (NTF) and topic models under one general framework. In BAM, allocations are made using a Bayesian network, whose conditional probability tables can be integrated out analytically. We show that, when allocations are viewed as sequential, the resulting marginal process is a special type of Polya urn process, which we name as Polya-Bayes process, an integer valued self-reinforcing process. Exploiting the Polya urn construction, we develop a novel sequential Monte Carlo (SMC) algorithm for marginal likelihood estimation in BAM, leading to a unified scoring method for discrete variable Bayesian networks with hidden nodes, including various NTF and topic models. The SMC estimator for marginal likelihood has the remarkable property of being unbiased in contrast to variational algorithms which are generally biased. We also demonstrate how our novel SMC-based likelihood estimation can be integrated within a Markov chain Monte Carlo algorithm for a principled and correct (in terms of respecting the true posterior distribution) Bayesian model selection and hyperparameter estimation for BAM. We provide several numerical examples, both on artificial and real datasets, that demonstrate the performance of the algorithms for various data regimes.

Item Type:Article
Uncontrolled Keywords:Data models; Tensors; Bayes methods; Resource management; Estimation; Probabilistic logic; Numerical models; Bayesian model selection; Bayesian network; Markov chain Monte Carlo; model scoring; nonnegative tensor factorization; sequential Monte Carlo; topic models; Pó lya  urns
Subjects:Q Science > QA Mathematics > QA273-280 Probabilities. Mathematical statistics
Q Science > QA Mathematics > QA075 Electronic computers. Computer science
ID Code:41492
Deposited By:Sinan Yıldırım
Deposited On:06 May 2021 13:31
Last Modified:06 May 2021 13:31

Repository Staff Only: item control page