m-tables: Representing Missing Data

Authors Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, Jeffrey Naughton, Val Tannen



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2017.21.pdf
  • Filesize: 0.57 MB
  • 20 pages

Document Identifiers

Author Details

Bruhathi Sundarmurthy
Paraschos Koutris
Willis Lang
Jeffrey Naughton
Val Tannen

Cite AsGet BibTex

Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, Jeffrey Naughton, and Val Tannen. m-tables: Representing Missing Data. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ICDT.2017.21

Abstract

Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible.
Keywords
  • missing values
  • incomplete data
  • c tables
  • representation systems

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu, editors. Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1995. Google Scholar
  2. Serge Abiteboul, Paris Kanellakis, and Gosta Grahne. On the representation and querying of sets of possible worlds. In Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data, SIGMOD'87, pages 34-48, New York, NY, USA, 1987. ACM. URL: http://dx.doi.org/10.1145/38713.38724.
  3. Antoine Amarilli and Michael Benedikt. Finite open-world query answering with number restrictions (extended version). CoRR, abs/1505.04216, 2015. URL: http://arxiv.org/abs/1505.04216.
  4. Yael Amsterdamer, Daniel Deutch, and Val Tannen. Provenance for aggregate queries. In Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'11, pages 153-164, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1989284.1989302.
  5. James Cheney, Laura Chiticariu, and Wang-Chiew Tan. Provenance in databases: Why, how, and where. Found. Trends databases, 1(4):379-474, April 2009. URL: http://dx.doi.org/10.1561/1900000006.
  6. Marco Console, Paolo Guagliardo, and Leonid Libkin. Approximations and refinements of certain answers via many-valued logics. In Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR 2016, Cape Town, South Africa, April 25-29, 2016., pages 349-358, 2016. URL: http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12813.
  7. AnHai Doan, Alon Halevy, and Zachary Ives. Principles of Data Integration. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2012. Google Scholar
  8. Wenfei Fan and Floris Geerts. Capturing missing tuples and missing values. In Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 169-178, 2010. URL: http://dx.doi.org/10.1145/1807085.1807109.
  9. G. H. Gessert. Four valued logic for relational database systems. SIGMOD Rec., 19(1):29-35, March 1990. URL: http://dx.doi.org/10.1145/382274.382401.
  10. Georg Gottlob and Roberto Zicari. Closed world databases opened through null values. In Proceedings of the 14th International Conference on Very Large Data Bases, VLDB'88, pages 50-61, San Francisco, CA, USA, 1988. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=645915.671794.
  11. G. Grahne. Information integration and incomplete information. In IEEE Data Eng. Bull., pages 46-52, 2002. Google Scholar
  12. Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. Update exchange with mappings and provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB'07, pages 675-686. VLDB Endowment, 2007. URL: http://dl.acm.org/citation.cfm?id=1325851.1325929.
  13. Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'07, pages 31-40, New York, NY, USA, 2007. ACM. URL: http://dx.doi.org/10.1145/1265530.1265535.
  14. Todd J. Green and Val Tannen. Models for incomplete and probabilistic information. In Proceedings of the 2006 International Conference on Current Trends in Database Technology, ICDT'06, pages 278-296, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11896548_24.
  15. Paolo Guagliardo and Leonid Libkin. Making sql queries correct on incomplete databases: A feasibility study. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS'16, pages 211-223, New York, NY, USA, 2016. ACM. URL: http://dx.doi.org/10.1145/2902251.2902297.
  16. Tomasz Imielinski and Witold Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761-791, 1984. URL: http://dx.doi.org/10.1145/1634.1886.
  17. Tomasz Imielinski and Witold Lipski, Jr. On representing incomplete information in a relational data base. In Proceedings of the Seventh International Conference on Very Large Data Bases - Volume 7, VLDB'81, pages 388-397. VLDB Endowment, 1981. URL: http://dl.acm.org/citation.cfm?id=1286831.1286869.
  18. Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. Querying data provenance. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD'10, pages 951-962, New York, NY, USA, 2010. ACM. URL: http://dx.doi.org/10.1145/1807167.1807269.
  19. Willis Lang, Rimma V. Nehme, Eric Robinson, and Jeffrey F. Naughton. Partial results in database systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD'14, pages 1275-1286, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2588555.2612176.
  20. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'02, pages 233-246, New York, NY, USA, 2002. ACM. URL: http://dx.doi.org/10.1145/543613.543644.
  21. Alon Y. Levy. Obtaining complete answers from incomplete databases. In Proceedings of the 22th International Conference on Very Large Data Bases, VLDB'96, pages 402-412, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=645922.673332.
  22. Leonid Libkin. Incomplete data: What went wrong, and how to fix it. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, pages 1-13, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2594538.2594561.
  23. Leonid Libkin. Sql’s three-valued logic and certain answers. ACM Trans. Database Syst., 41(1):1:1-1:28, March 2016. URL: http://dx.doi.org/10.1145/2877206.
  24. Anish Das Sarma, Omar Benjelloun, Alon Halevy, and Jennifer Widom. Working models for uncertain data. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE'06, pages 7-, Washington, DC, USA, 2006. IEEE Computer Society. URL: http://dx.doi.org/10.1109/ICDE.2006.174.
  25. Ron van der Meyden. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems, pages 307-356. Kluwer Academic Publishers, Norwell, MA, USA, 1998. URL: http://dx.doi.org/10.1007/978-1-4615-5643-5_10.
  26. Kwok-bun Yue. A more general model for handling missing information in relational databases using a 3-valued logic. SIGMOD Rec., 20(3):43-49, September 1991. URL: http://dx.doi.org/10.1145/126482.126487.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail