Complexity and Expressiveness of ShEx for RDF

Authors Slawek Staworko, Iovka Boneva, Jose E. Labra Gayo, Samuel Hym, Eric G. Prud'hommeaux, Harold Solbrig



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2015.195.pdf
  • Filesize: 0.58 MB
  • 17 pages

Document Identifiers

Author Details

Slawek Staworko
Iovka Boneva
Jose E. Labra Gayo
Samuel Hym
Eric G. Prud'hommeaux
Harold Solbrig

Cite As Get BibTex

Slawek Staworko, Iovka Boneva, Jose E. Labra Gayo, Samuel Hym, Eric G. Prud'hommeaux, and Harold Solbrig. Complexity and Expressiveness of ShEx for RDF. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 195-211, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.ICDT.2015.195

Abstract

We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. A ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multi- and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable.

Subject Classification

Keywords
  • RDF
  • Schema
  • Graph topology
  • Validation
  • Complexity
  • Expressiveness

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. Arenas, C. Gutierrez, and J. Pérez. Foundations of RDF databases. In Reasoning Web, Int'l Summer School on Semantic Technologies for Information Systems, pages 158-204, 2009. Invited Tutorial. Google Scholar
  2. M. Arenas, J. Pérez, Reutter J., C. Riveros, and J. Sequeda. Data exchange in the relational and RDF worlds. Invited talk at the Int'l Workshop on Semantic Web Information Management (SWIM), June 2011. Google Scholar
  3. G. J. Bex, F. Neven, T. Schwentick, and S. Vansummeren. Inference of concise regular expressions and DTDs. ACM Transactions on Database Systems, 35(2), 2010. Google Scholar
  4. G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In Int'l Conf. on Very Large Data Bases (VLDB), pages 998-1009, 2007. Google Scholar
  5. J. Bolleman, S. Gehant, and N. Redaschi. Catching inconsistencies with the semantic web: A biocuration case study. In Int'l Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS), 2012. Google Scholar
  6. I. Boneva, R. Ciucanu, and S. Staworko. Schemas for unordered XML on a DIME. Theory of Computing Systems, 2014. To appear. Available at http://arxiv.org/abs/1311.7307. Google Scholar
  7. I. Boneva, J. Emilio Labra Gayo, S. Hym, E. G. Prud'hommeau, H. Solbrig, and S. Staworko. Validating RDF with shape expressions, April 2014. Available at http://arxiv.org/abs/1404.1270. Google Scholar
  8. D. Brickley and R. V. Guha. RDF Schema 1.1. http://www.w3.org/TR/rdf-schema, February 2004.
  9. R. Ciucanu and S. Staworko. Learning schemas for unordered XML. In Int'l Symp. on Database Programming Languages (DBPL), 2013. Google Scholar
  10. D. Colazzo, G. Ghelli, L. Pardini, and C. Sartiani. Linear inclusion for XML regular expression types. In Int'l Conf. on Information and Knowledge Management (CIKM), pages 137-146, 2009. Google Scholar
  11. D. Colazzo, G. Ghelli, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. Information Systems, 34(7):643-656, 2009. Google Scholar
  12. B. Courcelle. The monadic second-order logic of graphs. I. Recognizable sets of finite graphs. Information and Computation, 85(1):12-75, 1990. Google Scholar
  13. M. Dean and M. Schreiber. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref, February 2004.
  14. H.-D. Ebbinghaus and J. Flum. Finite model theory. Springer, 1995. Google Scholar
  15. J. D. Fernández, M. A. Martínez-Prieto, C. Gutiérrez, A. Polleres, and A. Arias. Binary RDF representation for publication and exchange (HDT). J. Web Semantics, 19:22-41, 2013. Google Scholar
  16. G. Ghelli, D. Colazzo, and C. Sartiani. Linear time membership in a class of regular expressions with interleaving and counting. In Int'l Conf. on Information and Knowledge Management (CIKM), pages 389-398, 2008. Google Scholar
  17. S. Ginsburg and Spanier E. H. Semigroups, presburger formulas, and languages. Pacific Journal of Mathematics, 16(2):285-296, December 1966. Google Scholar
  18. B. Glimm and O. Chimezie. SPARQL 1.1 Entailment Regimes. http://www.w3.org/TR/sparql11-entailment/, 2012.
  19. A. V. Goldberg, E. Tardos, and R. E. Tarjan. Network flow algorithms. In Algorithms and Complexity, Volume 9, Paths, Flows, and VLSI-Layout, 1990. Google Scholar
  20. B. Groz, S. Maneth, and S. Staworko. Deterministic regular expressions in linear time. In ACM Symp. on Principles of Database Systems (PODS), May 2012. Google Scholar
  21. J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 2nd edition, 2001. Google Scholar
  22. E. Kopczynski and A. To. Parikh images of grammars: Complexity and applications. In LICS, pages 80-89, 2010. Google Scholar
  23. D. Kozen. Lower bounds for natural proof systems. In IEEE Symp. on Foundations of Computer Science (FOCS), pages 254-266, 1977. Google Scholar
  24. J. E. Labra Gayo, E. Prud'hommeaux, H. Solbrig, and J. M. Álvarez Rodríguez. Validating and describing linked data portals using RDF Shape Expressions. In Workshop on Linked Data Quality, September 2015. Google Scholar
  25. M. Montazerian, P. T. Wood, and S. R. Mousavi. XPath query satisfiability is in PTIME for real-world DTDs. In Int'l XML Database Symp. (Xsym), pages 17-30, 2007. Google Scholar
  26. M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn., 5(4):660-704, 2005. Google Scholar
  27. D. C. O. Oppen. A 2^2^2^pn upper bound on the complexity of presburger arithmetic. Journal of Computer and System Sciences, 16(3):323-332, 1978. Google Scholar
  28. C. H. Papadimitriou. On the complexity of integer programming. Journal of the ACM, 28(4):765-768, October 1981. Google Scholar
  29. R. J. Parikh. On context-free languages. Journal of the ACM, 13(4):570-581, 1966. Google Scholar
  30. E. Prud'hommeaux, J. E. Labra Gayo, and H. Solbrig. Shape Expressions: An RDF validation and transformation language. In Int'l Conf. on Semantic Systems, Sep. 2015. Google Scholar
  31. J. L. Reutter and T. Tan. A formalism for graph databases and its model of computation. In AMW, volume 749 of CEUR Workshop Proceedings. CEUR-WS.org, 2011. Google Scholar
  32. A. Ryman, A. Le Hors, and S. Speicher. Oslc resource shape: A language for defining constraints on linked data. In Proc. of the WWW2013 Workshop on Linked Data on the Web (LDOW). CEUR-WS.org, 2013. Google Scholar
  33. H. Seidl, T. Schwentick, and A. Muscholl. Counting in trees. Logic and Automata, pages 575-612, 2008. Google Scholar
  34. J. Sequeda, H. Tirmizi, S, Ó. Corcho, and D. P. Miranker. Survey of directly mapping SQL databases to the Semantic Web. Knowledge Engineering Review, 26(4):445-486, 2011. Google Scholar
  35. E. Sirin. Data validation with OWL integrity constraints. In Int'l Conf. on Web Reasoning and Rule Systems (RR), pages 18-22, 2010. Google Scholar
  36. S. Staworko and P. Wieczorek. Learning twig and path queries. In Int'l Conf. on Database Theory (ICDT), March 2012. Google Scholar
  37. J. Tao, E. Sirin, J. Bao, and D. L. McGuinness. Integrity constraints in OWL. In Int'l Conf. on Artificial Intelligence (AAAI), 2010. Google Scholar
  38. J. W. Thatcher and Wright J. B. Generalized finite automata with an application to a decision problem of second-order logic. Mathematical System Theory, 2:57-82, 1968. Google Scholar
  39. TPC. TPC benchmarks, URL: http://www.tpc.org/.
  40. W3C. RDF validation workshop report: Practical assurances for quality RDF data. ěrb|http://www.w3.org/2012/12/rdf-val/report|, September 2013. Google Scholar
  41. W3C. Shape expressions schemas, 2013. ěrb|http://www.w3.org/2013/ShEx/Primer|. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail