A Type System for Interactive JSON Schema Inference (Extended Abstract)

Authors Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani

Thumbnail PDF


  • Filesize: 0.56 MB
  • 13 pages

Document Identifiers

Author Details

Mohamed-Amine Baazizi
  • Sorbonne Université, CNRS, LIP6 UMR 7606, Paris, France
Dario Colazzo
  • Université Paris-Dauphine, PSL, LAMSADE, France
Giorgio Ghelli
  • Dipartimento di Informatica, Università di Pisa, Italy
Carlo Sartiani
  • DIMIE, Università della Basilicata - Potenza, Italy

Cite AsGet BibTex

Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. A Type System for Interactive JSON Schema Inference (Extended Abstract). In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 101:1-101:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


In this paper we present the first JSON type system that provides the possibility of inferring a schema by adopting different levels of precision/succinctness for different parts of the dataset, under user control. This feature gives the data analyst the possibility to have detailed schemas for parts of the data of greater interest, while more succinct schema is provided for other parts, and the decision can be changed as many times as needed, in order to explore the schema in a gradual fashion, moving the focus to different parts of the collection, without the need of reprocessing data and by only performing type rewriting operations on the most precise schema.

Subject Classification

ACM Subject Classification
  • Theory of computation → Type theory
  • Information systems → Semi-structured data
  • JSON
  • type systems
  • interactive inference


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Mohamed Amine Baazizi, Houssem Ben Lahmar, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. Schema Inference for Massive JSON Datasets. In EDBT '17, 2017. Google Scholar
  2. Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. Counting Types for Massive JSON Datasets. In DBPL '17, 2017. Google Scholar
  3. Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. A Type System for Interactive JSON Schema Inference, April 2019. URL: https://hal.archives-ouvertes.fr/hal-02112560.
  4. Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. Parametric schema inference for massive JSON datasets. The VLDB Journal, pages 1-25, 2019. Google Scholar
  5. Geert Jan Bex, Frank Neven, Thomas Schwentick, and Karl Tuyls. Inference of Concise DTDs from XML Data. In VLDB `06, pages 115-126, 2006. URL: http://dl.acm.org/citation.cfm?id=1164139.
  6. Tim Bray. The JavaScript Object Notation (JSON) Data Interchange Format. Technical report, Internet Engineering Task Force (IETF), December 20017. Standards Track. Google Scholar
  7. Radu Ciucanu and Slawek Staworko. Learning Schemas for Unordered XML. In Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Italy., 2013. URL: http://arxiv.org/abs/1307.6348.
  8. Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. Typing Massive JSON Datasets. In XLDI '12, Affiliated with ICFP, 2012. Google Scholar
  9. Michael DiScala and Daniel J. Abadi. Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, SIGMOD '16, pages 295-310. ACM, 2016. URL: http://dx.doi.org/10.1145/2882903.2882924.
  10. Dominik D. Freydenberger and Timo Kötzing. Fast Learning of Restricted Regular Expressions and DTDs. Theory Comput. Syst., 57(4):1114-1158, 2015. Google Scholar
  11. Markus Lohrey, Sebastian Maneth, and Carl Philipp Reh. Compression of Unordered XML Trees. In ICDT'07, pages 18:1-18:17, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.ICDT.2017.18.
  12. Irena Mlỳnková and Martin Nečaskỳ. Heuristic methods for inference of XML schemas: Lessons learned and open issues. Informatica, 24(4):577-602, 2013. Google Scholar
  13. Felipe Pezoa, Juan L. Reutter, Fernando Suarez, Martín Ugarte, and Domagoj Vrgoč. Foundations of JSON Schema. In WWW '16, pages 263-273, 2016. URL: http://dx.doi.org/10.1145/2872427.2883029.
  14. Stefanie Scherzinger, Eduardo Cunha de Almeida, Thomas Cerqueus, Leandro Batista de Almeida, and Pedro Holanda. Finding and Fixing Type Mismatches in the Evolution of Object-NoSQL Mappings. In Proceedings of the Workshops of the EDBT/ICDT 2016, 2016. URL: http://ceur-ws.org/Vol-1558/paper10.pdf.
  15. Julie Vyhnanovska and Irena Mlynkova. Interactive Inference of XML Schemas. In Proceedings of the Fourth IEEE International Conference on Research Challenges in Information Science, RCIS 2010, Nice, France, May 19-21, 2010, pages 191-202, 2010. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail