Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

Authors Martin Dietzfelbinger , Stefan Walzer



PDF
Thumbnail PDF

File

LIPIcs.ESA.2019.39.pdf
  • Filesize: 0.62 MB
  • 18 pages

Document Identifiers

Author Details

Martin Dietzfelbinger
  • Technische Universität Ilmenau, Germany
Stefan Walzer
  • Technische Universität Ilmenau, Germany

Acknowledgements

We are very grateful to Seth Pettie, who triggered this research by asking an insightful question regarding "one block" while discussing the two-block solution from [Martin Dietzfelbinger and Stefan Walzer, 2019]. (This discussion took place at the Dagstuhl Seminar 19051 "Data Structures for the Cloud and External Memory Data".) Thanks are also due to the reviewers, whose comments helped to improve the presentation.

Cite AsGet BibTex

Martin Dietzfelbinger and Stefan Walzer. Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 39:1-39:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.39

Abstract

In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over F_2 = {0,1} with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions n(1-epsilon) x n is generated as follows: In each row, identify a block of length L = O((log n)/epsilon) at a random position. The entries outside the block are 0, the entries inside the block are given by fair coin tosses. Sorting the rows according to the positions of the blocks transforms the matrix into a kind of band matrix, on which, as it turns out, Gauss elimination works very efficiently with high probability. For the proof, the effects of Gauss elimination are interpreted as a ("coin-flipping") variant of Robin Hood hashing, whose behaviour can be captured in terms of a simple Markov model from queuing theory. Bounds for expected construction time and high success probability follow from results in this area. They readily extend to larger finite fields in place of F_2. By employing hashing, this matrix family leads to a new implementation of a retrieval data structure, which represents an arbitrary function f: S -> {0,1} for some set S of m = (1-epsilon)n keys. It requires m/(1-epsilon) bits of space, construction takes O(m/epsilon^2) expected time on a word RAM, while queries take O(1/epsilon) time and access only one contiguous segment of O((log m)/epsilon) bits in the representation (O(1/epsilon) consecutive words on a word RAM). The method is readily implemented and highly practical, and it is competitive with state-of-the-art methods. In a more theoretical variant, which works only for unrealistically large S, we can even achieve construction time O(m/epsilon) and query time O(1), accessing O(1) contiguous memory words for a query. By well-established methods the retrieval data structure leads to efficient constructions of (static) perfect hash functions and (static) Bloom filters with almost optimal space and very local storage access patterns for queries.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures design and analysis
Keywords
  • Random Band Matrix
  • Gauss Elimination
  • Retrieval
  • Hashing
  • Succinct Data Structure
  • Randomised Data Structure
  • Robin Hood Hashing
  • Bloom Filter

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Austin Appleby. MurmurHash3, 2012. URL: https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp.
  2. Gregory V. Bard. Algebraic Cryptanalysis, chapter The Method of Four Russians, pages 133-158. Springer US, Boston, MA, 2009. URL: https://doi.org/10.1007/978-0-387-88757-9_9.
  3. Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna. Cache-oblivious peeling of random hypergraphs. In Proc. DCC'14, pages 352-361, 2014. URL: https://doi.org/10.1109/DCC.2014.48.
  4. Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive crawling for the masses. In Proc. 23rd WWW'14, pages 227-228. ACM, 2014. URL: https://doi.org/10.1145/2567948.2577304.
  5. Fabiano C. Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. A practical minimal perfect hashing method. In Proc. 4th WEA, pages 488-500, 2005. URL: https://doi.org/10.1007/11427186_42.
  6. Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and space-efficient minimal perfect hash functions. In Proc. 10th WADS, pages 139-150, 2007. URL: https://doi.org/10.1007/978-3-540-73951-7_13.
  7. Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. Practical perfect hashing in nearly optimal space. Inf. Syst., 38(1):108-131, 2013. URL: https://doi.org/10.1016/j.is.2012.06.002.
  8. Andrei Z. Broder and Michael Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 2003. URL: https://doi.org/10.1080/15427951.2004.10129096.
  9. Pedro Celis, Per-Åke Larson, and J. Ian Munro. Robin Hood hashing. In Proc. 26th FOCS, pages 281-288, 1985. URL: https://doi.org/10.1109/SFCS.1985.48.
  10. Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The Bloomier filter: An efficient data structure for static support lookup tables. In Proc. 15th SODA, pages 30-39, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982797.
  11. Colin Cooper. On the rank of random matrices. Random Struct. Algor., 16(2):209-232, 2000. URL: https://doi.org/10.1002/(SICI)1098-2418(200003)16:2<209::AID-RSA6>3.0.CO;2-1.
  12. Robert B. Cooper. Introduction to Queueing Theory. Elsevier/North-Holland, 2nd edition, 1981. URL: http://www.cse.fau.edu/~bob/publications/IntroToQueueingTheory_Cooper.pdf.
  13. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. The MIT Press, 3rd edition, 2009. Google Scholar
  14. Luc Devroye, Pat Morin, and Alfredo Viola. On worst-case Robin Hood hashing. SIAM J. Comput., 33(4):923-936, 2004. URL: https://doi.org/10.1137/S0097539702403372.
  15. Martin Dietzfelbinger and Rasmus Pagh. Succinct data structures for retrieval and approximate membership (extended abstract). In Proc. 35th ICALP (1), pages 385-396, 2008. URL: https://doi.org/10.1007/978-3-540-70575-8_32.
  16. Martin Dietzfelbinger and Michael Rink. Applications of a splitting trick. In Proc. 36th ICALP (1), pages 354-365, 2009. URL: https://doi.org/10.1007/978-3-642-02927-1_30.
  17. Martin Dietzfelbinger and Stefan Walzer. Constant-time retrieval with O(log m) extra bits. In Proc. 36th STACS, pages 24:1-24:16, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.24.
  18. Martin Dietzfelbinger and Stefan Walzer. Dense peelable random uniform hypergraphs. In Proc. 27th ESA, pages 38:1-38:16, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.38.
  19. Martin Dietzfelbinger and Christoph Weidling. Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci., 380(1-2):47-68, 2007. URL: https://doi.org/10.1016/j.tcs.2007.02.054.
  20. Wayne Eberly. On efficient band matrix arithmetic. In Proc. 33rd FOCS, pages 457-463, 1992. URL: https://doi.org/10.1109/SFCS.1992.267806.
  21. Regina Egorova, Bert Zwart, and Onno Boxma. Sojourn time tails in the M 1 processor sharing queue. Probab. Eng. Inf. Sci., 20:429-446, 2006. URL: https://doi.org/10.1017/S0269964806060268.
  22. Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast scalable construction of (minimal perfect hash) functions. In Proc. 15th SEA, pages 339-352, 2016. URL: https://doi.org/10.1007/978-3-319-38851-9_23.
  23. Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996. Google Scholar
  24. George Havas, Bohdan S. Majewski, Nicholas C. Wormald, and Zbigniew J. Czech. Graphs, hypergraphs and hashing. In Proc. 19th WG, pages 153-165, 1993. URL: https://doi.org/10.1007/3-540-57899-4_49.
  25. Svante Janson. Individual displacements for linear probing hashing with different insertion policies. ACM Trans. Algorithms, 1(2):177-213, 2005. URL: https://doi.org/10.1145/1103963.1103964.
  26. Svante Janson. Individual displacements in hashing with coalesced chains. Comb. Probab. Comput., 17(6):799-814, 2008. URL: https://doi.org/10.1017/S0963548308009395.
  27. Svante Janson and Alfredo Viola. A unified approach to linear probing hashing with buckets. Algorithmica, 75(4):724-781, 2016. URL: https://doi.org/10.1007/s00453-015-0111-x.
  28. David G. Kendall. Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded markov chain. Ann. Math. Statist., 24(3):338-354, September 1953. URL: https://doi.org/10.1214/aoms/1177728975.
  29. Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, and Daniel A. Spielman. Efficient erasure correcting codes. IEEE Transactions on Information Theory, 47(2):569-584, 2001. URL: https://doi.org/10.1109/18.910575.
  30. Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, Daniel A. Spielman, and Volker Stemann. Practical loss-resilient codes. In Proc. 29th STOC, pages 150-159, 1997. URL: https://doi.org/10.1145/258533.258573.
  31. Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. Google Scholar
  32. Michael Molloy. Cores in random hypergraphs and boolean formulas. Random Struct. Algorithms, 27(1):124-135, 2005. URL: https://doi.org/10.1002/rsa.20061.
  33. Victor Y. Pan, Isdor Sobze, and Antoine Atinkpahoun. On parallel computations with banded matrices. Inf. Comput., 120(2):237-250, 1995. URL: https://doi.org/10.1006/inco.1995.1111.
  34. Boris Pittel and Gregory B. Sorkin. The satisfiability threshold for k-XORSAT. Comb. Probab. Comput., 25(2):236-268, 2016. URL: https://doi.org/10.1017/S0963548315000097.
  35. Ely Porat. An optimal Bloom filter replacement based on matrix solving. In Proc. 4th CSR, pages 263-273, 2009. URL: https://doi.org/10.1007/978-3-642-03351-3_25.
  36. Alfredo Viola. Exact distribution of individual displacements in linear probing hashing. ACM Trans. Algorithms, 1(2):214-242, 2005. URL: https://doi.org/10.1145/1103963.1103965.
  37. Douglas H. Wiedemann. Solving sparse linear equations over finite fields. IEEE Trans. Inf. Theory, 32(1):54-62, 1986. URL: https://doi.org/10.1109/TIT.1986.1057137.