Communication Cost of Joins over Federated Data

Authors Tamara Cucumides, Juan Reutter



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2024.5.pdf
  • Filesize: 0.74 MB
  • 19 pages

Document Identifiers

Author Details

Tamara Cucumides
  • University of Antwerp, Belgium
Juan Reutter
  • PUC Chile & IMFD Chile, Santiago, Chile

Cite AsGet BibTex

Tamara Cucumides and Juan Reutter. Communication Cost of Joins over Federated Data. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICDT.2024.5

Abstract

We study the problem of querying different data sources, which we assume out of our control and that are made available by standard web communication protocols. In this scenario, the time spent communicating data often dominates the time spent processing local queries in each server. Thus, our focus is on algorithms that minimize the communication between the query processing server and the federated servers containing data. However, any federated query can always be answered with linear communication, simply by requesting all the data to the federated sources. Further, one can show that certain queries do require this amount of communication. But sending all the data is definitely not a relevant algorithm from a practical point of view. This worst-case analysis is, therefore, not useful for our needs. There is a growing body of work in terms of designing strategies that minimize communication in query federation, but these strategies are commonly based in heuristics, and we currently miss a formal analysis providing guidelines for the design of such strategies. We focus on the communication complexity of federated joins when the problem is parameterized by a measure commonly referred to as the certificate of the instance: a framework that has been used before in the context of set intersection and local query processing. We show how to process any conjunctive query in time given by the certificate of instances. Our algorithm is an adaptation of Minesweeper, one of the algorithms devised for local query processing, into our federating setting. When certificates are of the size of the instance, this amount to sending the entire database, but our strategy provides drastic reductions in the communication needed for queries and instances with small certificates. We also show matching communication lower bounds for cases where the certificate is smaller than the size of active domain of the instances.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database query processing and optimization (theory)
  • Information systems → Relational database query languages
Keywords
  • databases
  • database queries
  • query federation
  • communication complexity
  • adaptive algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, Juan L Reutter, Javiel Rojas-Ledesma, and Adrián Soto. Worst-case optimal graph joins in almost no space. In Proceedings of the 2021 International Conference on Management of Data, pages 102-114, 2021. URL: https://doi.org/10.1145/3448016.3457256.
  2. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. Journal of the ACM (JACM), 64(6):1-58, 2017. URL: https://doi.org/10.1145/3125644.
  3. Joshua Brody, Amit Chakrabarti, Ranganath Kondapally, David P Woodruff, and Grigory Yaroslavtsev. Beyond set disjointness: the communication complexity of finding the intersection. In Proceedings of the 2014 ACM symposium on Principles of distributed computing, pages 106-113, 2014. URL: https://doi.org/10.1145/2611462.2611501.
  4. Carlos Buil-Aranda, Marcelo Arenas, Oscar Corcho, and Axel Polleres. Federating queries in sparql 1.1: Syntax, semantics and evaluation. Journal of Web Semantics, 18(1):1-17, 2013. URL: https://doi.org/10.1016/j.websem.2012.10.001.
  5. Carlos Buil-Aranda, Axel Polleres, and Jürgen Umbrich. Strategies for executing federated queries in sparql1. 1. In International Semantic Web Conference, pages 390-405. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-11915-1_25.
  6. Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. Ontop: Answering sparql queries over relational databases. Semantic Web, 8(3):471-487, 2017. URL: https://doi.org/10.3233/SW-160217.
  7. Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. Adaptive set intersections, unions, and differences. In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pages 743-752, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338634.
  8. Aidan Hogan. Web of data. In The Web of Data, pages 15-57. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-51580-5.
  9. Aidan Hogan, Cristian Riveros, Carlos Rojas, and Adrián Soto. A worst-case optimal join algorithm for sparql. In International Semantic Web Conference, pages 258-275. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-30793-6_15.
  10. Mahmoud Abo Khamis, Hung Q Ngo, Christopher Ré, and Atri Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems (TODS), 41(4):1-45, 2016. URL: https://doi.org/10.1145/2967101.
  11. Eyal Kushilevitz. Communication complexity. In Advances in Computers, volume 44, pages 331-360. Elsevier, 1997. URL: https://doi.org/10.1016/S0065-2458(08)60342-3.
  12. Brian McBride. The resource description framework (rdf) and its vocabulary description language rdfs. In Handbook on ontologies, pages 51-65. Springer, 2004. Google Scholar
  13. Gabriela Montoya, Hala Skaf-Molli, and Katja Hose. The odyssey approach for optimizing federated sparql queries. In International semantic web conference, pages 471-489. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-68288-4_28.
  14. Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. Federated sparql queries processing with replicated fragments. In International Semantic Web Conference, pages 36-51. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-25007-6_3.
  15. Matthieu Mosser, Fernando Pieressa, Juan Reutter, Adrián Soto, and Domagoj Vrgoč. Querying apis with SPARQL: language and worst-case optimal algorithms. In European Semantic Web Conference, pages 639-654. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-93417-4_41.
  16. Hung Q Ngo, Dung T Nguyen, Christopher Re, and Atri Rudra. Beyond worst-case analysis for joins with minesweeper. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 234-245, 2014. URL: https://doi.org/10.1145/2594538.2594547.
  17. Dung Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q Ngo, Christopher Ré, and Atri Rudra. Join processing for graph patterns: An old dog with new tricks. In Proceedings of the GRADES'15, pages 1-8. Association for Computing Machinery, 2015. URL: https://doi.org/10.1145/2764947.2764948.
  18. Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel-Cyrille Ngonga Ngomo. Costfed: Cost-based query optimization for sparql endpoint federation. Procedia Computer Science, 137:163-174, 2018. URL: https://doi.org/10.1016/j.procs.2018.09.016.
  19. Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In International semantic web conference, pages 601-616. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-25073-6_38.
  20. Dirk Van Gucht, Ryan Williams, David P Woodruff, and Qin Zhang. The communication complexity of distributed set-joins with applications to matrix multiplication. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 199-212, 2015. URL: https://doi.org/10.1145/2745754.2745779.
  21. Todd L Veldhuizen. Leapfrog triejoin: a worst-case optimal join algorithm. arXiv preprint arXiv:1210.0481, 2012. URL: https://doi.org/10.48550/arXiv.1210.0481.
  22. Ruben Verborgh, Thomas Steiner, Davy Van Deursen, Sam Coppens, Joaquim Gabarró Vallés, and Rik Van de Walle. Functional descriptions as the bridge between hypermedia apis and the semantic web. In Proceedings of the third international workshop on restful design, pages 33-40, 2012. URL: https://doi.org/10.1145/2307819.2307828.
  23. Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, and Pieter Colpaert. Triple pattern fragments: a low-cost knowledge graph interface for the web. Journal of Web Semantics, 37:184-206, 2016. URL: https://doi.org/10.1016/j.websem.2016.03.003.
  24. Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB, volume 81, pages 82-94, 1981. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail