On The I/O Complexity of Dynamic Distinct Counting
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size.
In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).
distinct counting
lower bound
external memory
265-276
Regular Paper
Xiaocheng
Hu
Xiaocheng Hu
Yufei
Tao
Yufei Tao
Yi
Yang
Yi Yang
Shengyu
Zhang
Shengyu Zhang
Shuigeng
Zhou
Shuigeng Zhou
10.4230/LIPIcs.ICDT.2015.265
Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM (CACM), 31(9):1116-1127, 1988.
Lars Arge, Mikael Knudsen, and Kirsten Larsen. A general lower bound on the I/O-complexity of comparison-based algorithms. In Algorithms and Data Structures Workshop (WADS), pages 83-94, 1993.
Lars Arge and Peter Bro Miltersen. On showing lower bounds for external-memory computational geometry problems. DIMACS Series in Discrete Mathematics, pages 139-159, 1999.
Gerth StÃ¸lting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546-554, 2003.
Anirban Dasgupta, Ravi Kumar, and D. Sivakumar. Sparse and lopsided set disjointness via information theory. In APPROX-RANDOM, pages 517-528, 2012.
Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Patrascu. De dictionariis dynamicis pauco spatio utentibus (lat. on dynamic dictionaries using little space). In Latin American Symposium on Theoretical Informatics (LATIN), pages 349-361, 2006.
Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 345-354, 1989.
Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, and Vasilis Samoladas. On a model of indexability and its bounds for range queries. Journal of the ACM (JACM), 49(1):35-55, 2002.
John Iacono and Mihai Patrascu. Using hashing to solve the dictionary problem. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 570-582, 2012.
Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 41-52, 2010.
Mihai Patrascu and Erik D. Demaine. Lower bounds for dynamic connectivity. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 546-553, 2004.
Elad Verbin and Qin Zhang. The limits of buffering: a tight lower bound for dynamic membership in the external memory model. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 447-456, 2010.
Zhewei Wei, Ke Yi, and Qin Zhang. Dynamic external hashing: the limit of buffering. In Proceedings of Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 253-259, 2009.
Ke Yi. Dynamic indexability and the optimality of B-trees. Journal of the ACM (JACM), 59(4), 2012.
Ke Yi and Qin Zhang. On the cell probe complexity of dynamic membership. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 123-133, 2010.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode