Inference of Shape Graphs for Graph Databases

Authors Benoît Groz , Aurélien Lemay , Sławek Staworko , Piotr Wieczorek



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2022.14.pdf
  • Filesize: 0.94 MB
  • 20 pages

Document Identifiers

Author Details

Benoît Groz
  • University Paris Sud, France
Aurélien Lemay
  • University of Lille, France
Sławek Staworko
  • University of Lille, France
Piotr Wieczorek
  • University of Wrocław, Poland

Cite AsGet BibTex

Benoît Groz, Aurélien Lemay, Sławek Staworko, and Piotr Wieczorek. Inference of Shape Graphs for Graph Databases. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 14:1-14:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ICDT.2022.14

Abstract

We investigate the problem of constructing a shape graph that describes the structure of a given graph database. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound, i.e., always producing a schema that validates the input graph, and complete, i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We identify a number of fundamental limitations that preclude feasible inference. We present inference algorithms based on natural approaches that allow to infer schemas that we argue to be of practical importance.

Subject Classification

ACM Subject Classification
  • Information systems → Graph-based database models
Keywords
  • RDF
  • Schema
  • Inference
  • Learning
  • Fitting
  • Minimality
  • Containment

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. S. Abiteboul, M. Arenas, P. Barceló, M. Bienvenu, D. Calvanese, C. David, R. Hull, E. Hüllermeier, B. Kimelfeld, L. Libkin, W. Martens, T. Milo, F. Murlak, F. Neven, M. Ortiz, T. Schwentick, J. Stoyanovich, J. Su, D. Suciu, V. Vianu, and K. Yi. Research directions for principles of data management (Dagstuhl Perspectives Workshop 16151). Dagstuhl Manifestos, 7(1):1-29, 2018. Google Scholar
  2. D. Angluin. Inductive inference of formal languages from positive data. Information and Control, 45(2):117-135, 1980. Google Scholar
  3. M. Arenas, G. I. Diaz, A. Fokoue, A. Kementsietsidis, and K. Srinivas. A principled approach to bridging the gap between graph data and their schemas. Proceedings of the VLDB Endowment, 7(8):601-612, 2014. Google Scholar
  4. R. Asif and M. A. Qadir. Enhancing the Nobel Prize schema. In International Conference on Communication, Computing and Digital Systems (C-CODE), pages 193-198, 2017. Google Scholar
  5. M. A. Baazizi, H. Ben Lahmar, D. Colazzo, G. Ghelli, and C. Sartiani. Schema inference for massive JSON datasets. In International Conference on Extending Database Technology (EDBT), 2017. Google Scholar
  6. M. A. Baazizi, D. Colazzo, G. Ghelli, and C. Sartiani. Parametric schema inference for massive JSON datasets. The VLDB Journal, 28(4):497-521, 2019. Google Scholar
  7. G. J. Bex, W. Gelade, F. Neven, and S. Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. ACM Transactions on the Web, 4(4):1-32, 2010. Google Scholar
  8. G. J. Bex, F. Neven, T. Schwentick, and K. Tuyls. Inference of concise DTDs from XML data. In International Conference on Very Large Databases (VLDB), pages 115-126, 2006. Google Scholar
  9. G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In International Conference on Very Large Databases (VLDB), pages 998-1009, 2007. Google Scholar
  10. C. Bizer and A. Schultz. The Berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems, 5:1-24, 2009. Google Scholar
  11. I. Boneva, R. Ciucanu, and S. Staworko. Schemas for unordered XML on a DIME. Theory of Computing Systems, 2014. Google Scholar
  12. I. Boneva, J. Dusart, D. Fernández-Álvarez, and J. E. Labra Gayo. Semi automatic construction of shex and SHACL schemas. CoRR, abs/1907.10603, 2019. URL: http://arxiv.org/abs/1907.10603.
  13. I. Boneva, J. Lozano, and S. Staworko. Relational to RDF data exchange in presence of a shape expression schema. In Alberto Mendelzon International Workshop on Foundations of Data Management, 2018. Google Scholar
  14. A. Bonifati, S. Dumbrava, and H. Kondylakis. Graph summarization. CoRR, abs/2004.14794, 2020. URL: http://arxiv.org/abs/2004.14794.
  15. P. Buneman, S. B. Davidson, M. F. Fernandez, and D. Suciu. Adding structure to unstructured data. In International Conference on Database Theory (ICDT), pages 336-350, 1997. Google Scholar
  16. S. Campinas, R. Delbru, and G. Tummarello. Efficiency and precision trade-offs in graph summary algorithms. In International Database Engineering & Applications Symposium (IDEAS), pages 38-47, 2013. Google Scholar
  17. J. Carme, R. Gilleron, A. Lemay, and J. Niehren. Interactive learning of node selecting tree transducers. Machine Learning, 66(1):33-67, 2007. Google Scholar
  18. Š. Čebirić, F. Goasdoué, P. Guzewicz, and I. Manolescu. Compact summaries of rich heterogeneous graphs. Research Report RR-8920, INRIA Saclay ; Université Rennes 1, 2018. Google Scholar
  19. Š. Čebirić, F. Goasdoué, H. Kondylakis, D. Kotzinos, I. Manolescu, G. Troullinou, and M. Zneika. Summarizing semantic graphs: a survey. VLDB J., 28(3):295-327, 2019. Google Scholar
  20. Š. Čebirić, F. Goasdoué, and I Manolescu. A framework for efficient representative summarization of RDF graphs. In International Semantic Web Conference (ISWC), 2017. Google Scholar
  21. B. Chidlovskii. Schema extraction from XML: A grammatical inference approach. In Knowledge Representation Meets Databases (KRDB), volume 45, 2001. Google Scholar
  22. R. Ciucanu and S. Staworko. Learning schemas for unordered XML. In International Symposium on Database Programming Languages (DBPL), 2013. Google Scholar
  23. W. Fan, J. Li, X. Wang, and Y. Wu. Query preserving graph compression. In ACM SIGMOD International Conference on Management of Data, pages 157-168, 2012. Google Scholar
  24. M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In International Conference on Data Engineering (ICDE), pages 14-23, 1998. Google Scholar
  25. M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. XTRACT: learning document type descriptors from XML document collections. Data Mining and Knowledge Discovery, 7(1):23-56, 2003. Google Scholar
  26. F. Goasdoué, P. Guzewicz, and I. Manolescu. Incremental structural summarization of RDF graphs. In International Conference on Extending Database Technology (EDBT), 2019. Google Scholar
  27. F. Goasdoué, P. Guzewicz, and I. Manolescu. RDF graph summarization for first-sight structure discovery. The VLDB Journal, 29(5):1191-1218, 2020. Google Scholar
  28. E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302-320, 1978. Google Scholar
  29. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In International Conference on Very Large Data Bases (VLDB), pages 436-445, 1997. Google Scholar
  30. B. Groz, A. Lemay, S. Staworko, and P. Wieczorek. Inference of shape expression schemas typed RDF graphs. CoRR, abs/2107.04891, 2021. URL: http://arxiv.org/abs/2107.04891.
  31. J. Heinz, A. Kasprzik, and T. Kötzing. Learning in the limit with lattice-structured hypothesis spaces. Theoretical Computer Science, 457:111-127, 2012. Google Scholar
  32. A. Iana, S. Jung, P. Naeser, A. Birukou, S. Hertling, and H. Paulheim. Building a conference recommender system based on SciGraph and WikiCFP. In Semantic Systems. The Power of AI and Knowledge Graphs, pages 117-123, 2019. Google Scholar
  33. S. Khatchadourian and M. Consens. ExpLOD: Summary-based exploration of interlinking and RDF usage in the linked open data cloud. In Extended Semantic Web Conference (ESWC), pages 272-287, 2010. Google Scholar
  34. H. Kondylakis, D. Kotzinos, and I. Manolescu. RDF graph summarization: principles, techniques and applications. In International Conference on Extending Database Technology (EDBT), pages 433-436, 2019. Google Scholar
  35. J. E. Labra Gayo, E. Prud'hommeaux, H. Solbrig, and J. M. Alvarez Rodriguez. Validating and describing linked data portals using RDF Shape Expressions. In Workshop on Linked Data Quality, 2015. Google Scholar
  36. G. Laurence, A. Lemay, J. Niehren, S. Staworko, and M. Tommasi. Learning sequential tree-to-word transducers. In Language and Automata Theory and Applications (LATA), pages 490-502, 2014. Google Scholar
  37. H. Lbath, A. Bonifati, and R. Harmer. Schema inference for property graphs. In International Conference on Extending Database Technology (EDBT), pages 499-504, 2021. Google Scholar
  38. A. Lemay, S. Maneth, and J. Niehren. A learning algorithm for top-down XML transformations. In ACM Symposium on Principles of Database Systems (PODS), pages 285-296, 2010. Google Scholar
  39. J.-K. Min, J.-Y. Ahn, and C.-W. Chung. Efficient extraction of schemas for XML documents. Information Processing Letters, 85(1):7-12, 2003. Google Scholar
  40. S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In ACM SIGMOD International Conference on Management of Data, pages 295-306, 1998. Google Scholar
  41. S. Nestorov, J. D. Ullman, J. L. Wiener, and S. S. Chawathe. Representative objects: Concise representations of semistructured, hierarchial data. In International Conference on Data Engineering (ICDE), pages 79-90, 1997. Google Scholar
  42. E. Prud'hommeaux, J. E. Labra Gayo, and H. Solbrig. Shape Expressions: An RDF validation and transformation language. In International Conference on Semantic Systems, 2015. Google Scholar
  43. J. T. Reese, D. Unni, Callahan T. J., L. Cappelletti, V. Ravanmehr, S. Carbon, K. A. Shefchek, B. M. Good, J. P. Balhoff, T. Fontana, H. Blau, N. Matentzoglu, N. L. Harris, M. C. Munoz-Torres, M. A. Haendel, P. N. Robinson, M. P. Joachimiak, and C. J. Mungall. KG-COVID-19: A framework to produce customized knowledge graphs for COVID-19 response. Patterns, 2(1):100155, 2021. Google Scholar
  44. A. Schätzle, A. Neu, G. Lausen, and M. Przyjaciel-Zablocki. Large-scale bisimulation of RDF graphs. In Workshop on Semantic Web Information Management (SWIM), pages 1-8, 2013. Google Scholar
  45. J. Sequeda, S. H. Tirmizi, Ó. Corcho, and D. P. Miranker. Survey of directly mapping SQL databases to the Semantic Web. Knowledge Engineering Review, 26(4):445-486, 2011. Google Scholar
  46. J. F. Sequeda, M. Arenas, and D. P. Miranker. On directly mapping relational databases to RDF and OWL. In International Conference on World Wide Web (WWW), pages 649-658, 2012. Google Scholar
  47. S. Staworko, I. Boneva, J. E. Labra Gayo, S. Hym, E. G. Prud'hommeaux, and H. Solbrig. Complexity and expressiveness of ShEx for RDF. In International Conference on Database Theory (ICDT), pages 195-211, 2015. Google Scholar
  48. S. Staworko and P. Wieczorek. Learning twig and path queries. In International Conference on Database Theory (ICDT), pages 140-154, 2012. Google Scholar
  49. S. Staworko and P. Wieczorek. Containment of shape expression schemas for RDF. In ACM Symposium on Principles of Database Systems (PODS), pages 303-319, 2019. Google Scholar
  50. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In International Conference on World Wide Web (WWW), pages 697-0706. Association for Computing Machinery, 2007. Google Scholar
  51. TPC. TPC benchmarks. URL: http://www.tpc.org/.
  52. T. Tran, G. Ladwig, and S. Rudolph. Managing structured and semistructured RDF data using structure indexes. Transactions on Knowledge and Data Engineering, 25(9):2076-2089, 2013. Google Scholar
  53. Y. Tsuboi and N. Suzuki. An algorithm for extracting shape expression schemas from graphs. In ACM Symposium on Document Engineering (DocEng), pages 1-4, 2019. Google Scholar
  54. W3C. A direct mapping of relational data to RDF, 2012. URL: http://www.w3.org/TR/rdb-direct-mapping/.
  55. W3C. Shape expressions schemas, 2013. URL: http://www.w3.org/2013/ShEx/Primer.
  56. H. Zhang, Y. Duan, X. Yuan, and Y. Zhang. Assg: adaptive structural summary for RDF graph data. In International Semantic Web Conference (ISWC) Posters & Demonstrations Track, pages 233-236, 2014. Google Scholar