Declarative Probabilistic Programming with Datalog

Barany, Vince; ten Cate, Balder; Kimelfeld, Benny; Olteanu, Dan; Vagena, Zografoula

doi:10.4230/LIPIcs.ICDT.2016.7

Abstract

Probabilistic programming languages are used for developing statistical models, and they typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate an extension of Datalog for specifying statistical models, and establish a declarative probabilistic-programming paradigm over databases. Our proposed extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.

Serge Abiteboul, Daniel Deutch, and Victor Vianu. Deduction with contradictions in Datalog. In ICDT, pages 143-154, 2014.
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
Alfred V. Aho, Catriel Beeri, and Jeffrey D. Ullman. The theory of joins in relational databases. ACM Trans. on Datab. Syst., 4(3):297-314, 1979.
Lyublena Antova, Christoph Koch, and Dan Olteanu. From complete to incomplete information and back. In SIGMOD, pages 713-724, 2007. URL: http://dx.doi.org/10.1145/1247480.1247559.
Lyublena Antova, Christoph Koch, and Dan Olteanu. Query language support for incomplete information in the MayBMS system. In VLDB, pages 1422-1425, 2007.
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. Design and implementation of the LogicBlox system. In PDOS, pages 1371-1382. ACM, 2015.
Robert B. Ash and Catherine Doleans-Dade. Probability &Measure Theory. Harcourt Academic Press, 2000.
Chitta Baral, Michael Gelfond, and Nelson Rushton. Probabilistic reasoning with answer sets. Theory Pract. Log. Program., 9(1):57-144, 2009.
Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. Probabilistic XML via Markov chains. PVLDB, 3(1):770-781, 2010.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. of Machine Learning Research, 3:993-1022, 2003.
Matthias Bröcheler, Lilyana Mihalkova, and Lise Getoor. Probabilistic similarity logic. In UAI, pages 73-82, 2010.
Zhuhua Cai, Zografoula Vagena, Luis Leopoldo Perez, Subramanian Arumugam, Peter J. Haas, and Christopher M. Jermaine. Simulation of database-valued Markov chains using SimSQL. In SIGMOD, pages 637-648, 2013.
Sara Cohen and Benny Kimelfeld. Querying parse trees of stochastic context-free grammars. In ICDT, pages 62-75. ACM, 2010.
Daniel Deutch, Christoph Koch, and Tova Milo. On probabilistic fixpoint and Markov chain query languages. In PODS, pages 215-226, 2010. URL: http://dx.doi.org/10.1145/1807085.1807114.
Pedro Domingos and Daniel Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on AI and Machine Learning. Morgan & Claypool Publishers, 2009.
Thomas Eiter, Georg Gottlob, and Heikki Mannila. Disjunctive Datalog. ACM Trans. Database Syst., 22(3):364-418, 1997.
Kousha Etessami and Mihalis Yannakakis. Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM, 56(1), 2009. URL: http://dx.doi.org/10.1145/1462153.1462154.
Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange: Semantics and query answering. In ICDT, pages 207-224, 2003.
Norbert Fuhr. Probabilistic Datalog: Implementing logical information retrieval for advanced applications. JASIS, 51(2):95-110, 2000.
Noah D. Goodman. The principles and practice of probabilistic programming. In POPL, pages 399-402, 2013.
Georg Gottlob, Thomas Lukasiewicz, MariaVanina Martinez, and Gerardo Simari. Query answering under probabilistic uncertainty in Datalog+/- ontologies. Annals of Math.&AI, 69(1):37-72, 2013. URL: http://dx.doi.org/10.1007/s10472-013-9342-1.
Terry Halpin and Spencer Rugaber. LogiQL: A Query Language for Smart Databases. CRC Press, 2014.
Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. Datalog and emerging applications: An interactive tutorial. In SIGMOD, pages 1213-1216, 2011. URL: http://dx.doi.org/10.1145/1989323.1989456.
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, and Peter J. Haas. MCDB: a Monte Carlo approach to managing uncertain data. In SIGMOD, pages 687-700, 2008.
Benny Kimelfeld and Pierre Senellart. Probabilistic XML: models and complexity. In Advances in Probabilistic Databases for Uncertain Information Management, volume 304 of Studies in Fuzziness and Soft Computing, pages 39-66. Springer, 2013.
Angelika Kimmig, Bart Demoen, Luc De Raedt, Vitor Santos Costa, and Ricardo Rocha. On the implementation of the probabilistic logic programming language ProbLog. Theory and Practice of Logic Programming, 11:235-262, 2011. URL: http://dx.doi.org/10.1017/S1471068410000566.
Lyric Labs. Chimple. URL: http://chimple.probprog.org/.
Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative networking. Commun. ACM, 52(11):87-95, 2009. URL: http://dx.doi.org/10.1145/1592761.1592785.
David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. Testing implications of data dependencies. ACM Trans. on Datab. Syst., 4(4):455-469, 1979.
Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, abs/1404.0099, 2014. URL: http://arxiv.org/abs/1404.0099.
Andrew Kachites McCallum. Multi-label text classification with a mixture model trained by EM. In Assoc. for the Advancement of Artificial Intelligence workshop on text learning, 1999.
B. Milch and et al. BLOG: Probabilistic models with unknown objects. In IJCAI, pages 1352-1359, 2005.
Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, pages 103-134, 2000.
Feng Niu, Christopher Ré, AnHai Doan, and Jude W. Shavlik. Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. PVLDB, 4(6):373-384, 2011.
Feng Niu, Ce Zhang, Christopher Re, and Jude W. Shavlik. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In Int. Workshop on Searching and Integrating New Web Data Sources, volume 884 of CEUR Workshop Proceedings, pages 25-28, 2012.
Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. R2: an efficient MCMC sampler for probabilistic programs. In AAAI, pages 2476-2482, 2014.
Brooks Paige and Frank Wood. A compilation target for probabilistic programming languages. In ICML, volume 32, pages 1935-1943, 2014.
Anand Patil, David Huard, and Christopher J. Fonnesbeck. PyMC: Bayesian Stochastic Modelling in Python. J. of Statistical Software, 35(4):1-81, 2010.
Judea Pearl. Probabilistic reasoning in intelligent systems - networks of plausible inference. Morgan Kaufmann, 1989.
Avi Pfeffer. Figaro: An object-oriented probabilistic programming language. Technical report, Charles River Analytics, 2009.
David Poole. The independent choice logic and beyond. In Probabilistic Inductive Logic Programming - Theory and Applications, pages 222-243, 2008.
Repository on probabilistic programming languages, 2014. URL: http://www.probabilistic-programming.org.
H. Raiffa and R. Schlaifer. Applied Statistical Decision Theory. Harvard University Press, Harvard, 1961.
Taisuke Sato and Yoshitaka Kameya. PRISM: A language for symbolic-statistical modeling. In IJCAI, pages 1330-1339, 1997.
Adrian Smith, Arnaud Doucet, Nando de Freitas, and Neil Gordon. Sequential Monte Carlo methods in practice. Springer Science &Business Media, 2013.
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
Balder ten Cate, Benny Kimelfeld, and Dan Olteanu. PPDL: probabilistic programming with Datalog. In AMW, volume 1378 of CEUR Workshop Proceedings. CEUR-WS.org, 2015.
Jennifer Widom. Trio: a system for data, uncertainty, and lineage. In Charu Aggarwal, editor, Managing and Mining Uncertain Data, chapter 5. Springer-Verlag, 2008.

Declarative Probabilistic Programming with Datalog

Authors Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, Zografoula Vagena

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message