Declarative Probabilistic Programming with Datalog

Authors Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, Zografoula Vagena



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2016.7.pdf
  • Filesize: 0.57 MB
  • 19 pages

Document Identifiers

Author Details

Vince Barany
Balder ten Cate
Benny Kimelfeld
Dan Olteanu
Zografoula Vagena

Cite AsGet BibTex

Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. Declarative Probabilistic Programming with Datalog. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ICDT.2016.7

Abstract

Probabilistic programming languages are used for developing statistical models, and they typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate an extension of Datalog for specifying statistical models, and establish a declarative probabilistic-programming paradigm over databases. Our proposed extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.
Keywords
  • Chase
  • Datalog
  • probability measure space
  • probabilistic programming

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Daniel Deutch, and Victor Vianu. Deduction with contradictions in Datalog. In ICDT, pages 143-154, 2014. Google Scholar
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
  3. Alfred V. Aho, Catriel Beeri, and Jeffrey D. Ullman. The theory of joins in relational databases. ACM Trans. on Datab. Syst., 4(3):297-314, 1979. Google Scholar
  4. Lyublena Antova, Christoph Koch, and Dan Olteanu. From complete to incomplete information and back. In SIGMOD, pages 713-724, 2007. URL: http://dx.doi.org/10.1145/1247480.1247559.
  5. Lyublena Antova, Christoph Koch, and Dan Olteanu. Query language support for incomplete information in the MayBMS system. In VLDB, pages 1422-1425, 2007. Google Scholar
  6. Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. Design and implementation of the LogicBlox system. In PDOS, pages 1371-1382. ACM, 2015. Google Scholar
  7. Robert B. Ash and Catherine Doleans-Dade. Probability &Measure Theory. Harcourt Academic Press, 2000. Google Scholar
  8. Chitta Baral, Michael Gelfond, and Nelson Rushton. Probabilistic reasoning with answer sets. Theory Pract. Log. Program., 9(1):57-144, 2009. Google Scholar
  9. Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. Probabilistic XML via Markov chains. PVLDB, 3(1):770-781, 2010. Google Scholar
  10. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. of Machine Learning Research, 3:993-1022, 2003. Google Scholar
  11. Matthias Bröcheler, Lilyana Mihalkova, and Lise Getoor. Probabilistic similarity logic. In UAI, pages 73-82, 2010. Google Scholar
  12. Zhuhua Cai, Zografoula Vagena, Luis Leopoldo Perez, Subramanian Arumugam, Peter J. Haas, and Christopher M. Jermaine. Simulation of database-valued Markov chains using SimSQL. In SIGMOD, pages 637-648, 2013. Google Scholar
  13. Sara Cohen and Benny Kimelfeld. Querying parse trees of stochastic context-free grammars. In ICDT, pages 62-75. ACM, 2010. Google Scholar
  14. Daniel Deutch, Christoph Koch, and Tova Milo. On probabilistic fixpoint and Markov chain query languages. In PODS, pages 215-226, 2010. URL: http://dx.doi.org/10.1145/1807085.1807114.
  15. Pedro Domingos and Daniel Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on AI and Machine Learning. Morgan & Claypool Publishers, 2009. Google Scholar
  16. Thomas Eiter, Georg Gottlob, and Heikki Mannila. Disjunctive Datalog. ACM Trans. Database Syst., 22(3):364-418, 1997. Google Scholar
  17. Kousha Etessami and Mihalis Yannakakis. Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM, 56(1), 2009. URL: http://dx.doi.org/10.1145/1462153.1462154.
  18. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange: Semantics and query answering. In ICDT, pages 207-224, 2003. Google Scholar
  19. Norbert Fuhr. Probabilistic Datalog: Implementing logical information retrieval for advanced applications. JASIS, 51(2):95-110, 2000. Google Scholar
  20. Noah D. Goodman. The principles and practice of probabilistic programming. In POPL, pages 399-402, 2013. Google Scholar
  21. Georg Gottlob, Thomas Lukasiewicz, MariaVanina Martinez, and Gerardo Simari. Query answering under probabilistic uncertainty in Datalog+/- ontologies. Annals of Math.&AI, 69(1):37-72, 2013. URL: http://dx.doi.org/10.1007/s10472-013-9342-1.
  22. Terry Halpin and Spencer Rugaber. LogiQL: A Query Language for Smart Databases. CRC Press, 2014. Google Scholar
  23. Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. Datalog and emerging applications: An interactive tutorial. In SIGMOD, pages 1213-1216, 2011. URL: http://dx.doi.org/10.1145/1989323.1989456.
  24. Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, and Peter J. Haas. MCDB: a Monte Carlo approach to managing uncertain data. In SIGMOD, pages 687-700, 2008. Google Scholar
  25. Benny Kimelfeld and Pierre Senellart. Probabilistic XML: models and complexity. In Advances in Probabilistic Databases for Uncertain Information Management, volume 304 of Studies in Fuzziness and Soft Computing, pages 39-66. Springer, 2013. Google Scholar
  26. Angelika Kimmig, Bart Demoen, Luc De Raedt, Vitor Santos Costa, and Ricardo Rocha. On the implementation of the probabilistic logic programming language ProbLog. Theory and Practice of Logic Programming, 11:235-262, 2011. URL: http://dx.doi.org/10.1017/S1471068410000566.
  27. Lyric Labs. Chimple. URL: http://chimple.probprog.org/.
  28. Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative networking. Commun. ACM, 52(11):87-95, 2009. URL: http://dx.doi.org/10.1145/1592761.1592785.
  29. David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. Testing implications of data dependencies. ACM Trans. on Datab. Syst., 4(4):455-469, 1979. Google Scholar
  30. Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, abs/1404.0099, 2014. URL: http://arxiv.org/abs/1404.0099.
  31. Andrew Kachites McCallum. Multi-label text classification with a mixture model trained by EM. In Assoc. for the Advancement of Artificial Intelligence workshop on text learning, 1999. Google Scholar
  32. B. Milch and et al. BLOG: Probabilistic models with unknown objects. In IJCAI, pages 1352-1359, 2005. Google Scholar
  33. Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, pages 103-134, 2000. Google Scholar
  34. Feng Niu, Christopher Ré, AnHai Doan, and Jude W. Shavlik. Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. PVLDB, 4(6):373-384, 2011. Google Scholar
  35. Feng Niu, Ce Zhang, Christopher Re, and Jude W. Shavlik. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In Int. Workshop on Searching and Integrating New Web Data Sources, volume 884 of CEUR Workshop Proceedings, pages 25-28, 2012. Google Scholar
  36. Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. R2: an efficient MCMC sampler for probabilistic programs. In AAAI, pages 2476-2482, 2014. Google Scholar
  37. Brooks Paige and Frank Wood. A compilation target for probabilistic programming languages. In ICML, volume 32, pages 1935-1943, 2014. Google Scholar
  38. Anand Patil, David Huard, and Christopher J. Fonnesbeck. PyMC: Bayesian Stochastic Modelling in Python. J. of Statistical Software, 35(4):1-81, 2010. Google Scholar
  39. Judea Pearl. Probabilistic reasoning in intelligent systems - networks of plausible inference. Morgan Kaufmann, 1989. Google Scholar
  40. Avi Pfeffer. Figaro: An object-oriented probabilistic programming language. Technical report, Charles River Analytics, 2009. Google Scholar
  41. David Poole. The independent choice logic and beyond. In Probabilistic Inductive Logic Programming - Theory and Applications, pages 222-243, 2008. Google Scholar
  42. Repository on probabilistic programming languages, 2014. URL: http://www.probabilistic-programming.org.
  43. H. Raiffa and R. Schlaifer. Applied Statistical Decision Theory. Harvard University Press, Harvard, 1961. Google Scholar
  44. Taisuke Sato and Yoshitaka Kameya. PRISM: A language for symbolic-statistical modeling. In IJCAI, pages 1330-1339, 1997. Google Scholar
  45. Adrian Smith, Arnaud Doucet, Nando de Freitas, and Neil Gordon. Sequential Monte Carlo methods in practice. Springer Science &Business Media, 2013. Google Scholar
  46. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. Google Scholar
  47. Balder ten Cate, Benny Kimelfeld, and Dan Olteanu. PPDL: probabilistic programming with Datalog. In AMW, volume 1378 of CEUR Workshop Proceedings. CEUR-WS.org, 2015. Google Scholar
  48. Jennifer Widom. Trio: a system for data, uncertainty, and lineage. In Charu Aggarwal, editor, Managing and Mining Uncertain Data, chapter 5. Springer-Verlag, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail