Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk)

Authors Silviu Maniu , Pierre Senellart



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.33.pdf
  • Filesize: 0.54 MB
  • 6 pages

Document Identifiers

Author Details

Silviu Maniu
  • Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
  • CNRS@CREATE LTD, Singapore
Pierre Senellart
  • DI ENS, ENS, CNRS, PSL University, Inria, Paris, France
  • Institut Universitaire de France, Paris, France
  • CNRS@CREATE LTD, Singapore
  • IPAL, CNRS, Singapore

Acknowledgements

ProvSQL is a collective effort; we acknowledge the contributions of Belkis Djeffal, Louis Jachiet, Pratik Karmakar, Baptiste Lafosse, Aryak Sen, Albert Ariel Widiaatmaja.

Cite As Get BibTex

Silviu Maniu and Pierre Senellart. Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 33:1-33:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.33

Abstract

There has been a rich literature in database theory on how to model and manage the provenance of data (for instance using the semiring framework) and its uncertainty (in particular via probabilistic databases). In this article, we explain how these results have been used as the basis for practical implementations, notably in the ProvSQL system, and how these implementations need to be adapted for the efficient management of provenance and probability for real-world data.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data provenance
  • Theory of computation → Incomplete, inconsistent, and uncertain databases
  • Information systems → Database management system engines
Keywords
  • provenance
  • probabilistic data
  • ProvSQL

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Capturing continuous data and answering aggregate queries in probabilistic XML. ACM Trans. Database Syst., 36(4):25:1-25:45, 2011. URL: https://doi.org/10.1145/2043652.2043658.
  2. Parag Agrawal, Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Shubha U. Nabar, Tomoe Sugihara, and Jennifer Widom. Trio: A system for data, uncertainty, and lineage. In Umeshwar Dayal, Kyu-Young Whang, David B. Lomet, Gustavo Alonso, Guy M. Lohman, Martin L. Kersten, Sang Kyun Cha, and Young-Kuk Kim, editors, Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006, pages 1151-1154. ACM, 2006. URL: http://dl.acm.org/citation.cfm?id=1164231.
  3. Antoine Amarilli. Leveraging the Structure of Uncertain Data. (Tirer parti de la structure des données incertaines). PhD thesis, Télécom ParisTech, France, 2016. URL: https://tel.archives-ouvertes.fr/tel-01345836.
  4. Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. Provenance circuits for trees and treelike instances. In Magnús M. Halldórsson, Kazuo Iwama, Naoki Kobayashi, and Bettina Speckmann, editors, Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part II, volume 9135 of Lecture Notes in Computer Science, pages 56-68. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-47666-6_5.
  5. Antoine Amarilli, Florent Capelli, Mikaël Monet, and Pierre Senellart. Connecting knowledge compilation classes and width parameters. Theory Comput. Syst., 64(5):861-914, 2020. URL: https://doi.org/10.1007/S00224-019-09930-2.
  6. Yael Amsterdamer, Daniel Deutch, and Val Tannen. Provenance for aggregate queries. In Maurizio Lenzerini and Thomas Schwentick, editors, Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, pages 153-164. ACM, 2011. URL: https://doi.org/10.1145/1989284.1989302.
  7. Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. Probabilistic XML via Markov chains. Proc. VLDB Endow., 3(1):770-781, 2010. URL: https://doi.org/10.14778/1920841.1920939.
  8. Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. Data provenance: Some basic issues. In Sanjiv Kapoor and Sanjiva Prasad, editors, Foundations of Software Technology and Theoretical Computer Science, 20th Conference, FST TCS 2000 New Delhi, India, December 13-15, 2000, Proceedings, volume 1974 of Lecture Notes in Computer Science, pages 87-93. Springer, 2000. URL: https://doi.org/10.1007/3-540-44450-5_6.
  9. Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. Why and where: A characterization of data provenance. In Jan Van den Bussche and Victor Vianu, editors, Database Theory - ICDT 2001, 8th International Conference, London, UK, January 4-6, 2001, Proceedings, volume 1973 of Lecture Notes in Computer Science, pages 316-330. Springer, 2001. URL: https://doi.org/10.1007/3-540-44503-X_20.
  10. E. F. Codd. Understanding relations (installment #7). FDT Bull. ACM SIGFIDET SIGMOD, 7(3):23-28, 1975. Google Scholar
  11. Nilesh N. Dalvi, Christopher Ré, and Dan Suciu. Queries and materialized views on probabilistic databases. J. Comput. Syst. Sci., 77(3):473-490, 2011. URL: https://doi.org/10.1016/J.JCSS.2010.04.006.
  12. Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523-544, 2007. URL: https://doi.org/10.1007/S00778-006-0004-3.
  13. Nilesh N. Dalvi and Dan Suciu. The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM, 59(6):30:1-30:87, 2012. URL: https://doi.org/10.1145/2395116.2395119.
  14. Daniel Deutch, Tova Milo, Sudeepa Roy, and Val Tannen. Circuits for Datalog provenance. In Nicole Schweikardt, Vassilis Christophides, and Vincent Leroy, editors, Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014, pages 201-212. OpenProceedings.org, 2014. URL: https://doi.org/10.5441/002/ICDT.2014.22.
  15. Floris Geerts and Antonella Poggi. On database query languages for K-relations. J. Appl. Log., 8(2):173-185, 2010. URL: https://doi.org/10.1016/J.JAL.2009.09.001.
  16. Todd J. Green, Gregory Karvounarakis, and Val Tannen. Provenance semirings. In Leonid Libkin, editor, Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, pages 31-40. ACM, 2007. URL: https://doi.org/10.1145/1265530.1265535.
  17. Todd J. Green and Val Tannen. Models for incomplete and probabilistic information. In Torsten Grust, Hagen Höpfner, Arantza Illarramendi, Stefan Jablonski, Marco Mesiti, Sascha Müller, Paula- Lavinia Patranjan, Kai-Uwe Sattler, Myra Spiliopoulou, and Jef Wijsen, editors, Current Trends in Database Technology - EDBT 2006, EDBT 2006 Workshops PhD, DataX, IIDB, IIHA, ICSNW, QLQP, PIM, PaRMA, and Reactivity on the Web, Munich, Germany, March 26-31, 2006, Revised Selected Papers, volume 4254 of Lecture Notes in Computer Science, pages 278-296. Springer, 2006. URL: https://doi.org/10.1007/11896548_24.
  18. Martin Grohe and Peter Lindner. Independence in infinite probabilistic databases. J. ACM, 69(5):37:1-37:42, 2022. URL: https://doi.org/10.1145/3549525.
  19. Tomasz Imielinski and Witold Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761-791, 1984. URL: https://doi.org/10.1145/1634.1886.
  20. Abhay Kumar Jha and Dan Suciu. Knowledge compilation meets database theory: Compiling queries to decision diagrams. Theory Comput. Syst., 52(3):403-440, 2013. URL: https://doi.org/10.1007/S00224-012-9392-5.
  21. Jean-Marie Lagniez and Pierre Marquis. An improved decision-DNNF compiler. In Carles Sierra, editor, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 667-673. ijcai.org, 2017. URL: https://doi.org/10.24963/IJCAI.2017/93.
  22. Silviu Maniu, Reynold Cheng, and Pierre Senellart. An indexing framework for queries on probabilistic graphs. ACM Trans. Database Syst., 42(2):13:1-13:34, 2017. URL: https://doi.org/10.1145/3044713.
  23. Silviu Maniu, Pierre Senellart, and Suraj Jog. An experimental study of the treewidth of real-world graph data. In Pablo Barceló and Marco Calautti, editors, 22nd International Conference on Database Theory, ICDT 2019, March 26-28, 2019, Lisbon, Portugal, volume 127 of LIPIcs, pages 12:1-12:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPICS.ICDT.2019.12.
  24. Yann Ramusat, Silviu Maniu, and Pierre Senellart. Semiring provenance over graph databases. In Melanie Herschel, editor, 10th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2018, London, UK, July 11-12, 2018. USENIX Association, 2018. URL: https://www.usenix.org/conference/tapp2018/presentation/ramusat.
  25. Yann Ramusat, Silviu Maniu, and Pierre Senellart. Provenance-based algorithms for rich queries over graph databases. In Yannis Velegrakis, Demetris Zeinalipour-Yazti, Panos K. Chrysanthis, and Francesco Guerra, editors, Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 73-84. OpenProceedings.org, 2021. URL: https://doi.org/10.5441/002/EDBT.2021.08.
  26. Pierre Senellart. ProvSQL. Software, swhId: https://archive.softwareheritage.org/swh:1:dir:28237b2e7a78f7ae65b0035d3bf352ce3ddd010b;origin=https://github.com/PierreSenellart/provsql;visit=swh:1:snp:5516fc852479335b8fb18c0059b1dcee8f36d78f;anchor=swh:1:rev:e4e773e3f66f0e61869ba73dab030b89f11eec79 (visited on 2025-03-04). URL: https://github.com/PierreSenellart/provsql, URL: https://doi.org/10.4230/artifacts.22981.
  27. Pierre Senellart. Provenance and probabilities in relational databases. SIGMOD Rec., 46(4):5-15, 2017. URL: https://doi.org/10.1145/3186549.3186551.
  28. Pierre Senellart. On the impact of provenance semiring theory on the design of a provenance-aware database system. In Antoine Amarilli and Alin Deutsch, editors, The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen, Tannen’s Festschrift, May 24-25, 2024, University of Pennsylvania, Philadelphia, PA, USA, volume 119 of OASIcs, pages 9:1-9:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024. URL: https://doi.org/10.4230/OASICS.TANNEN.9.
  29. Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. ProvSQL: Provenance and probability management in PostgreSQL. Proc. VLDB Endow., 11(12):2034-2037, 2018. URL: https://doi.org/10.14778/3229863.3236253.
  30. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. URL: https://doi.org/10.2200/S00362ED1V01Y201105DTM016.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail