Information Inequality Problem over Set Functions

Author Miika Hannula



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2024.19.pdf
  • Filesize: 0.77 MB
  • 20 pages

Document Identifiers

Author Details

Miika Hannula
  • University of Helsinki, Finland

Cite AsGet BibTex

Miika Hannula. Information Inequality Problem over Set Functions. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICDT.2024.19

Abstract

Information inequalities appear in many database applications such as query output size bounds, query containment, and implication between data dependencies. Recently Khamis et al. [Mahmoud Abo Khamis et al., 2020] proposed to study the algorithmic aspects of information inequalities, including the information inequality problem: decide whether a linear inequality over entropies of random variables is valid. While the decidability of this problem is a major open question, applications often involve only inequalities that adhere to specific syntactic forms linked to useful semantic invariance properties. This paper studies the information inequality problem in different syntactic and semantic scenarios that arise from database applications. Focusing on the boundary between tractability and intractability, we show that the information inequality problem is coNP-complete if restricted to normal polymatroids, and in polynomial time if relaxed to monotone functions. We also examine syntactic restrictions related to query output size bounds, and provide an alternative proof, through monotone functions, for the polynomial-time computability of the entropic bound over simple sets of degree constraints.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • entropy
  • information theory
  • worst-case output size
  • computational complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcelo Arenas and Leonid Libkin. An information-theoretic approach to normal forms for relational and XML data. J. ACM, 52(2):246-283, 2005. URL: https://doi.org/10.1145/1059513.1059519.
  2. Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. SIAM J. Comput., 42(4):1737-1767, 2013. URL: https://doi.org/10.1137/110859440.
  3. Catriel Beeri, Ronald Fagin, and John H. Howard. A complete axiomatization for functional and multivalued dependencies in database relations. In SIGMOD Conference, pages 47-61. ACM, 1977. URL: https://doi.org/10.1145/509404.509414.
  4. Randall Dougherty, Chris Freiling, and Kenneth Zeger. Non-shannon information inequalities in four random variables, 2011. URL: https://doi.org/10.48550/arXiv.1104.3602.
  5. E. Mark Gold. Complexity of automaton identification from given data. Inf. Control., 37(3):302-320, 1978. URL: https://doi.org/10.1016/S0019-9958(78)90562-4.
  6. Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. Size and treewidth bounds for conjunctive queries. J. ACM, 59(3):16:1-16:35, 2012. URL: https://doi.org/10.1145/2220357.2220363.
  7. Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. ACM Trans. Algorithms, 11(1):4:1-4:20, 2014. URL: https://doi.org/10.1145/2636918.
  8. Emirhan Gürpinar and Andrei E. Romashchenko. How to use undiscovered information inequalities: Direct applications of the copy lemma. In ISIT, pages 1377-1381. IEEE, 2019. URL: https://doi.org/10.1109/ISIT.2019.8849309.
  9. Miika Hannula. Information inequality problem over set functions. CoRR, abs/2309.11818, 2023. URL: https://doi.org/10.48550/arXiv.2309.11818.
  10. Christian Herrmann. On the undecidability of implications between embedded multivalued database dependencies. Information and Computation, 122(2):221-235, 1995. URL: https://doi.org/10.1006/inco.1995.1148.
  11. Sungjin Im, Benjamin Moseley, Hung Q. Ngo, Kirk Pruhs, and Alireza Samadian. Optimizing polymatroid functions. CoRR, abs/2211.08381, 2022. URL: https://doi.org/10.48550/arXiv.2211.08381.
  12. Tarik Kaced and Andrei E. Romashchenko. Conditional information inequalities for entropic and almost entropic points. IEEE Trans. Inf. Theory, 59(11):7149-7167, 2013. URL: https://doi.org/10.1109/TIT.2013.2274614.
  13. Batya Kenig and Dan Suciu. Integrity constraints revisited: From exact to approximate implication. Log. Methods Comput. Sci., 18(1), 2022. URL: https://doi.org/10.46298/lmcs-18(1:5)2022.
  14. Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. Decision problems in information theory. In ICALP, volume 168 of LIPIcs, pages 106:1-106:20, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.106.
  15. Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. Bag query containment and information theory. ACM Trans. Database Syst., 46(3):12:1-12:39, 2021. URL: https://doi.org/10.1145/3472391.
  16. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. Computing join queries with functional dependencies. In PODS, pages 327-342. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902289.
  17. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In PODS, pages 429-444. ACM, 2017. URL: https://doi.org/10.1145/3034786.3056105.
  18. Lukas Kühne and Geva Yashfe. On entropic and almost multilinear representability of matroids. CoRR, abs/2206.03465, 2022. URL: https://doi.org/10.48550/arXiv.2206.03465.
  19. Tony T. Lee. An information-theoretic analysis of relational databases - part I: data dependencies and information metric. IEEE Trans. Software Eng., 13(10):1049-1061, 1987. URL: https://doi.org/10.1109/TSE.1987.232847.
  20. Tony T. Lee. An information-theoretic analysis of relational databases - part II: information structures of database schemas. IEEE Trans. Software Eng., 13(10):1061-1072, 1987. URL: https://doi.org/10.1109/TSE.1987.232848.
  21. Cheuk Ting Li. Undecidability of network coding, conditional information inequalities, and conditional independence implication. IEEE Trans. Inf. Theory, 69(6):3493-3510, 2023. URL: https://doi.org/10.1109/TIT.2023.3247570.
  22. Wing Ning Li. Two-segmented channel routing is strong np-complete. Discret. Appl. Math., 78(1-3):291-298, 1997. URL: https://doi.org/10.1016/S0166-218X(97)00020-6.
  23. Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In PODS, pages 111-124. ACM, 2018. URL: https://doi.org/10.1145/3196959.3196990.
  24. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. J. ACM, 65(3):16:1-16:40, 2018. URL: https://doi.org/10.1145/3180143.
  25. Nicholas Pippenger. What are the laws of information theory. In Special Problems on Communication and Computation Conference, pages 3-5, 1986. Google Scholar
  26. Dan Suciu. Applications of information inequalities to database theory problems. In LICS, pages 1-30, 2023. URL: https://doi.org/10.1109/LICS56636.2023.10175769.
  27. Raymond W. Yeung. Information Theory and Network Coding. Springer Publishing Company, Incorporated, 1 edition, 2008. Google Scholar
  28. Z. Zhang and R.W. Yeung. A non-shannon-type conditional inequality of information quantities. IEEE Transactions on Information Theory, 43(6):1982-1986, 1997. URL: https://doi.org/10.1109/18.641561.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail