Data Structure Lower Bounds for Document Indexing Problems

Authors Peyman Afshani, Jesper Sindahl Nielsen



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2016.93.pdf
  • Filesize: 0.53 MB
  • 15 pages

Document Identifiers

Author Details

Peyman Afshani
Jesper Sindahl Nielsen

Cite As Get BibTex

Peyman Afshani and Jesper Sindahl Nielsen. Data Structure Lower Bounds for Document Indexing Problems. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 93:1-93:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ICALP.2016.93

Abstract

We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between our lower bounds and the known upper bounds, at least for some choice of input parameters.

The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden-pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.

Subject Classification

Keywords
  • Data Structure Lower Bounds
  • Pointer Machine
  • Set Intersection
  • Pattern Matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds for dynamic problems. In Proceedings of Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 434-443, 2014. Google Scholar
  2. Peyman Afshani. Improved pointer machine and I/O lower bounds for simplex range reporting and related problems. In Symposium on Computational Geometry (SoCG), pages 339-346, 2012. Google Scholar
  3. Peyman Afshani, Lars Arge, and Kasper Dalgaard Larsen. Orthogonal range reporting: query lower bounds, optimal structures in 3-d, and higher-dimensional improvements. In Symposium on Computational Geometry (SoCG), pages 240-246, 2010. Google Scholar
  4. Peyman Afshani, Lars Arge, and Kasper Green Larsen. Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In Symposium on Computational Geometry (SoCG), pages 323-332, 2012. URL: http://dx.doi.org/10.1145/2261250.2261299.
  5. Peyman Afshani and Jesper Sindahl Nielsen. Data structure lower bounds for document indexing problems. CoRR, abs/1604.06264, 2016. URL: http://arxiv.org/abs/1604.06264.
  6. Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Online Dictionary Matching with One Gap. CoRR, abs/1503.07563, 2015. URL: http://arxiv.org/abs/1503.07563.
  7. Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Dictionary Matching with One Gap. In Annual Symposium on Combinatorial Pattern Matching (CPM), pages 11-20, 2014. Google Scholar
  8. Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and Søren Vind. String indexing for patterns with wildcards. In Scandinavian Symposium and Workshops on Algorithm Theory (SWAT), pages 283-294, 2012. Google Scholar
  9. Philip Bille, Anna Pagh, and Rasmus Pagh. Fast evaluation of union-intersection expressions. In Algorithms and Computation, volume 4835 of Lecture Notes in Computer Science, pages 739-750. Springer Berlin Heidelberg, 2007. Google Scholar
  10. Sudip Biswas, Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Ranked Document Retrieval with Forbidden Pattern. In Annual Symposium on Combinatorial Pattern Matching (CPM), pages 77-88, 2015. Google Scholar
  11. Timothy M. Chan. Optimal partition trees. In Symposium on Computational Geometry (SoCG), pages 1-10. ACM, 2010. Google Scholar
  12. Timothy M. Chan, Kasper Green Larsen, and Mihai Pǎtraşcu. Orthogonal range searching on the RAM, revisited. In Symposium on Computational Geometry (SoCG), pages 1-10, 2011. Google Scholar
  13. Timothy M. Chan and Moshe Lewenstein. Clustered integer 3SUM via additive combinatorics. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 31-40, 2015. Google Scholar
  14. Bernard Chazelle. Lower Bounds for Orthogonal Range Searching: I. The Reporting Case. J. ACM, 37(2):200-212, 1990. Google Scholar
  15. Bernard Chazelle. Lower Bounds for Orthogonal Range Searching II. The Arithmetic Model. J. ACM, 37(3):439-463, 1990. Google Scholar
  16. Bernard Chazelle and Burton Rosenberg. Simplex Range Reporting on a Pointer Machine. Comput. Geom., 5:237-247, 1995. Google Scholar
  17. Bernard Chazelle, Micha Sharir, and Emo Welzl. Quasi-optimal upper bounds for simplex range searching and new zone theorems. Algorithmica, 8:407-429, December 1992. Google Scholar
  18. Hagai Cohen and Ely Porat. Fast set intersection and two-patterns matching. Theor. Comput. Sci., 411(40-42):3795-3800, 2010. Google Scholar
  19. Hagai Cohen and Ely Porat. On the hardness of distance oracle for sparse graph. http://arxiv.org/abs/1006.1117, 2010. Google Scholar
  20. Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don't cares. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 91-100, 2004. Google Scholar
  21. Paul F. Dietz, Kurt Mehlhorn, Rajeev Raman, and Christian Uhrig. Lower bounds for set intersection queries. Algorithmica, 14(2):154-168, 1995. Google Scholar
  22. Paolo Ferragina, Nick Koudas, S. Muthukrishnan, and Divesh Srivastava. Two-dimensional substring indexing. Journal of Computer and System Sciences, 66(4):763-774, 2003. Special Issue on PODS 2001. Google Scholar
  23. Johannes Fischer, Travis Gagie, Tsvi Kopelowitz, Moshe Lewenstein, Veli Mäkinen, Leena Salmela, and Niko Välimäki. Forbidden patterns. In LATIN 2012: Theoretical Informatics - 10th Latin American Symposium, Arequipa, Peru, April 16-20, 2012. Proceedings, pages 327-337, 2012. Google Scholar
  24. Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. String Retrieval for Multi-pattern Queries. In String Processing and Information Retrieval - 17th International Symposium, SPIRE 2010, Los Cabos, Mexico, October 11-13, 2010. Proceedings, pages 55-66, 2010. Google Scholar
  25. Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Document Listing for Queries with Excluded Pattern. In Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Helsinki, Finland, July 3-5, 2012. Proceedings, pages 185-195, 2012. Google Scholar
  26. Casper Kejlberg-Rasmussen, Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Word-packing algorithms for dynamic connectivity and dynamic sets. http://arxiv.org/abs/1407.6755, 2014. Google Scholar
  27. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3SUM conjecture. http://arxiv.org/abs/1407.6756, 2016. Google Scholar
  28. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3SUM conjecture. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1272-1287, 2016. Google Scholar
  29. Kasper Green Larsen, J. Ian Munro, Jesper Sindahl Nielsen, and Sharma V. Thankachan. On Hardness of Several String Indexing Problems. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Moscow, Russia, June 16-18, 2014. Proceedings, pages 242-251, 2014. Google Scholar
  30. Moshe Lewenstein, J. Ian Munro, Venkatesh Raman, and Sharma V. Thankachan. Less space: Indexing for queries with wildcards. Theoretical Computer Science, 557:120-127, 2014. Google Scholar
  31. Moshe Lewenstein, Yakov Nekrich, and Jeffrey Scott Vitter. Space-efficient string indexing for wildcard pattern matching. CoRR, abs/1401.0625, 2014. URL: http://arxiv.org/abs/1401.0625.
  32. Jiří Matoušek. Range searching with efficient hierarchical cuttings. Discrete & Computational Geometry, 10(2):157-182, 1993. Google Scholar
  33. S. Muthukrishnan. Efficient algorithms for document retrieval problems. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 657-666, 2002. Google Scholar
  34. Mihai Pǎtraşcu. Towards polynomial lower bounds for dynamic problems. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 603-610, 2010. Google Scholar
  35. Mihai Pǎtraşcu. Unifying the landscape of cell-probe lower bounds. SIAM Journal of Computing, 40(3):827-847, 2011. Google Scholar
  36. Mihai Pǎtraşcu, L. Roditty, and M. Thorup. A new infinity of distance oracles for sparse graphs. In Proceedings of Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 738-747, 2012. Google Scholar
  37. Mihai Pǎtraşcu and Liam Roditty. Distance oracles beyond the thorup-zwick bound. SIAM Journal on Computing, 43(1):300-311, 2014. Google Scholar
  38. M. Sohel Rahman and Costas S. Iliopoulos. Pattern matching algorithms with don't cares. In SOFSEM 2007: Theory and Practice of Computer Science, 33rd Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 20-26, 2007, Proceedings Volume II, pages 116-126, 2007. Google Scholar
  39. Robert Endre Tarjan. A class of algorithms which require nonlinear time to maintain disjoint sets. Journal of Computer and System Sciences, 18(2):110-127, 1979. Google Scholar
  40. Hing-Fung Ting and Yilin Yang. Dictionary matching with uneven gaps. In Annual Symposium on Combinatorial Pattern Matching (CPM), volume 9133, page 247, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail