On The I/O Complexity of Dynamic Distinct Counting

Authors Xiaocheng Hu, Yufei Tao, Yi Yang, Shengyu Zhang, Shuigeng Zhou



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2015.265.pdf
  • Filesize: 0.53 MB
  • 12 pages

Document Identifiers

Author Details

Xiaocheng Hu
Yufei Tao
Yi Yang
Shengyu Zhang
Shuigeng Zhou

Cite As Get BibTex

Xiaocheng Hu, Yufei Tao, Yi Yang, Shengyu Zhang, and Shuigeng Zhou. On The I/O Complexity of Dynamic Distinct Counting. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 265-276, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.ICDT.2015.265

Abstract

In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size.

In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).

Subject Classification

Keywords
  • distinct counting
  • lower bound
  • external memory

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM (CACM), 31(9):1116-1127, 1988. Google Scholar
  2. Lars Arge, Mikael Knudsen, and Kirsten Larsen. A general lower bound on the I/O-complexity of comparison-based algorithms. In Algorithms and Data Structures Workshop (WADS), pages 83-94, 1993. Google Scholar
  3. Lars Arge and Peter Bro Miltersen. On showing lower bounds for external-memory computational geometry problems. DIMACS Series in Discrete Mathematics, pages 139-159, 1999. Google Scholar
  4. Gerth Stølting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546-554, 2003. Google Scholar
  5. Anirban Dasgupta, Ravi Kumar, and D. Sivakumar. Sparse and lopsided set disjointness via information theory. In APPROX-RANDOM, pages 517-528, 2012. Google Scholar
  6. Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Patrascu. De dictionariis dynamicis pauco spatio utentibus (lat. on dynamic dictionaries using little space). In Latin American Symposium on Theoretical Informatics (LATIN), pages 349-361, 2006. Google Scholar
  7. Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 345-354, 1989. Google Scholar
  8. Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, and Vasilis Samoladas. On a model of indexability and its bounds for range queries. Journal of the ACM (JACM), 49(1):35-55, 2002. Google Scholar
  9. John Iacono and Mihai Patrascu. Using hashing to solve the dictionary problem. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 570-582, 2012. Google Scholar
  10. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 41-52, 2010. Google Scholar
  11. Mihai Patrascu and Erik D. Demaine. Lower bounds for dynamic connectivity. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 546-553, 2004. Google Scholar
  12. Elad Verbin and Qin Zhang. The limits of buffering: a tight lower bound for dynamic membership in the external memory model. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 447-456, 2010. Google Scholar
  13. Zhewei Wei, Ke Yi, and Qin Zhang. Dynamic external hashing: the limit of buffering. In Proceedings of Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 253-259, 2009. Google Scholar
  14. Ke Yi. Dynamic indexability and the optimality of B-trees. Journal of the ACM (JACM), 59(4), 2012. Google Scholar
  15. Ke Yi and Qin Zhang. On the cell probe complexity of dynamic membership. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 123-133, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail