Satisfiability for SCULPT-Schemas for CSV-Like Data

Authors Johannes Doleschal, Wim Martens, Frank Neven, Adam Witkowski



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2018.14.pdf
  • Filesize: 0.56 MB
  • 19 pages

Document Identifiers

Author Details

Johannes Doleschal
Wim Martens
Frank Neven
Adam Witkowski

Cite As Get BibTex

Johannes Doleschal, Wim Martens, Frank Neven, and Adam Witkowski. Satisfiability for SCULPT-Schemas for CSV-Like Data. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ICDT.2018.14

Abstract

SCULPT is a simple schema language inspired by the recent working effort towards a recommendation by the World Wide Web Consortium (W3C) for tabular data and metadata on the Web. In its core, a SCULPT schema consists of a set of rules where left-hand sides select sets of regions in the tabular data and the right-hand sides describe the contents of these regions. A document (divided in cells by row- and column-delimiters) then satisfies a schema if it satisfies every rule. In this paper, we study the satisfiability problem for SCULPT schemas. As SCULPT describes grid-like structures, satisfiability obviously becomes undecidable rather quickly even for very restricted schemas. We define a schema language called L-SCULPT (Lego SCULPT) that restricts the walking power of SCULPT by selecting rectangular shaped areas and only considers tables for which selected regions do not intersect. Depending on the axes used by L-SCULPT, we show that satisfiability is PTIME-complete or undecidable. One of the tractable fragments is practically useful as it extends the structural core of the current W3C proposal for schemas over tabular data. We therefore see how the navigational power of the W3C proposal can be extended while still retaining tractable satisfiability tests.

Subject Classification

Keywords
  • CSV
  • schema languages
  • semi-structured data

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi. Research directions for principles of data management (abridged). SIGMOD Record, 45(4):5-17, 2016. Google Scholar
  2. Marcelo Arenas, Francisco Maturana, Cristian Riveros, and Domagoj Vrgoč. A framework for annotating csv-like data. Proceedings of the VLDB Endownment (PVLDB), 9, 2016. Google Scholar
  3. Johannes Doleschal, Nico Höllerich, Martens, and Frank Neven. CHISEL: Sculpting tabular and non-tabular data on the Web. In World Wide Web Conference (WWW), 2018. To appear. Google Scholar
  4. Michael J. Fischer and Richard E. Ladner. Propositional dynamic logic of regular programs. Journal of Computer and System Sciences, 18(2):194-211, 1979. Google Scholar
  5. Dora Giammarresi and Antonio Restivo. Two-dimensional languages. In Handbook of Formal Languages: Volume 3 Beyond Words, chapter 4. Springer, 1997. Google Scholar
  6. Ian Glaister and Jeffrey Shallit. A lower bound technique for the size of nondeterministic finite automata. Information Processing Letters, 59(2):75-77, 1996. Google Scholar
  7. Stefan Göller, Markus Lohrey, and Carsten Lutz. PDL with intersection and converse: satisfiability and infinite-state model checking. The Journal of Symbolic Logic, 74(1):279-314, 2009. Google Scholar
  8. Angelo Gregori. Unit-length embedding of binary trees on a square grid. Information Processing Letters, 31:167-173, 1989. Google Scholar
  9. Ralf Heckmann, Ralf Klasing, Burkhard Monien, and Walter Unger. Optimal embedding of complete binary trees into lines and grid. Journal of Parallel and Distributed Computing, 49(1):40-56, 1998. Google Scholar
  10. Leonid Libkin, Wim Martens, and Domagoj Vrgoč. Querying graphs with data. Journal of the ACM, 63(2):14:1-14:53, 2016. Google Scholar
  11. Wim Martens, Frank Neven, and Stijn Vansummeren. SCULPT: A schema language for tabular data on the web. In World Wide Web Conference (WWW), pages 702-712, 2015. Google Scholar
  12. Rufus Pollock, Jeni Tennison, Gregg Kellogg, and Ivan Herman. Metadata vocabulary for tabular data. Technical report, World Wide Web Consortium (W3C), December 2015. URL: https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/.
  13. Jeremy Tandy, Davide Ceolin, and Eric Stephan. CSV on the Web: Use cases and requirements. Technical report, World Wide Web Consortium (W3C), Februrary 2016. URL: http://www.w3.org/TR/2016/NOTE-csvw-ucr-20160225/.
  14. Jeni Tennison. 2014: The year of CSV. http://theodi.org/blog/2014-the-year-of-csv. Visited on Sept. 18, 2017.
  15. Jeni Tennison and Gregg Kellogg. Model for tabular data and metadata on the web. Technical report, World Wide Web Consortium (W3C), July 2014. URL: https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/.
  16. Peter van Emde Boas. The convenience of tilings. In Complexity, Logic and Recursion Theory, volume 187 of Lecture Notes in Pure and Applied Mathematics, pages 331-363. Marcel Dekker Inc., 1997. Google Scholar
  17. World Wide Web Consortium (W3C). CSV on the web working group charter. https://www.w3.org/2013/05/lcsv-charter.html. Visited on Sept. 18, 2017.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail