License
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.DISC.2019.33
URN: urn:nbn:de:0030-drops-113400
URL: https://drops.dagstuhl.de/opus/volltexte/2019/11340/
Go to the corresponding LIPIcs Volume Portal


Su, Hsin-Hao ; Vu, Hoa T.

Distributed Data Summarization in Well-Connected Networks

pdf-format:
LIPIcs-DISC-2019-33.pdf (0.6 MB)


Abstract

We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements. We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) rounds , for p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds.

BibTeX - Entry

@InProceedings{su_et_al:LIPIcs:2019:11340,
  author =	{Hsin-Hao Su and Hoa T. Vu},
  title =	{{Distributed Data Summarization in Well-Connected Networks}},
  booktitle =	{33rd International Symposium on Distributed Computing (DISC 2019)},
  pages =	{33:1--33:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-126-9},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{146},
  editor =	{Jukka Suomela},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/11340},
  URN =		{urn:nbn:de:0030-drops-113400},
  doi =		{10.4230/LIPIcs.DISC.2019.33},
  annote =	{Keywords: Distributed Algorithms, Network Algorithms, Data Summarization}
}

Keywords: Distributed Algorithms, Network Algorithms, Data Summarization
Seminar: 33rd International Symposium on Distributed Computing (DISC 2019)
Issue Date: 2019
Date of publication: 11.10.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI