Internal Shortest Absent Word Queries

Authors Golnaz Badkobeh , Panagiotis Charalampopoulos , Solon P. Pissis



PDF
Thumbnail PDF

File

LIPIcs.CPM.2021.6.pdf
  • Filesize: 0.9 MB
  • 18 pages

Document Identifiers

Author Details

Golnaz Badkobeh
  • Department of Computing, Goldsmiths University of London, UK
Panagiotis Charalampopoulos
  • Efi Arazi School of Computer Science, The Interdisciplinary Center Herzliya, Israel
Solon P. Pissis
  • CWI, Amsterdam, The Netherlands
  • Vrije Universiteit, Amsterdam, The Netherlands

Cite AsGet BibTex

Golnaz Badkobeh, Panagiotis Charalampopoulos, and Solon P. Pissis. Internal Shortest Absent Word Queries. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.6

Abstract

Given a string T of length n over an alphabet Σ ⊂ {1,2,…,n^{𝒪(1)}} of size σ, we are to preprocess T so that given a range [i,j], we can return a representation of a shortest string over Σ that is absent in the fragment T[i]⋯ T[j] of T. For any positive integer k ∈ [1,log log_σ n], we present an 𝒪((n/k)⋅ log log_σ n)-size data structure, which can be constructed in 𝒪(nlog_σ n) time, and answers queries in time 𝒪(log log_σ k).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • string algorithms
  • internal queries
  • shortest absent word
  • bit parallelism

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Paniz Abedin, Arnab Ganguly, Wing-Kai Hon, Yakov Nekrich, Kunihiko Sadakane, Rahul Shah, and Sharma V. Thankachan. A linear-space data structure for Range-LCP queries in poly-logarithmic time. In Computing and Combinatorics - 24th International Conference, COCOON 2018, pages 615-625, 2018. URL: https://doi.org/10.1007/978-3-319-94776-1_51.
  2. Paniz Abedin, Arnab Ganguly, Solon P. Pissis, and Sharma V. Thankachan. Efficient data structures for range shortest unique substring queries. Algorithms, 13(11):276, 2020. URL: https://doi.org/10.3390/a13110276.
  3. Amihood Amir, Alberto Apostolico, Gad M. Landau, Avivit Levy, Moshe Lewenstein, and Ely Porat. Range LCP. J. Comput. Syst. Sci., 80(7):1245-1253, 2014. URL: https://doi.org/10.1016/j.jcss.2014.02.010.
  4. Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Dynamic and internal longest common substring. Algorithmica, 82(12):3707-3743, 2020. URL: https://doi.org/10.1007/s00453-020-00744-0.
  5. Amihood Amir, Gad M. Landau, Moshe Lewenstein, and Dina Sokol. Dynamic text and static pattern matching. ACM Trans. Algorithms, 3(2):19, 2007. URL: https://doi.org/10.1145/1240233.1240242.
  6. Lorraine A. K. Ayad, Golnaz Badkobeh, Gabriele Fici, Alice Héliou, and Solon P. Pissis. Constructing antidictionaries in output-sensitive space. In Data Compression Conference, DCC 2019, pages 538-547. IEEE, 2019. URL: https://doi.org/10.1109/DCC.2019.00062.
  7. Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, Ignat I. Kolesnichenko, and Tatiana Starikovskaya. Computing minimal and maximal suffixes of a substring. Theor. Comput. Sci., 638:112-121, 2016. URL: https://doi.org/10.1016/j.tcs.2015.08.023.
  8. Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 572-591. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.39.
  9. Carl Barton, Alice Héliou, Laurent Mouchard, and Solon P. Pissis. Linear-time computation of minimal absent words using suffix array. BMC Bioinform., 15:388, 2014. URL: https://doi.org/10.1186/s12859-014-0388-9.
  10. Carl Barton, Alice Héliou, Laurent Mouchard, and Solon P. Pissis. Parallelising the computation of minimal absent words. In Parallel Processing and Applied Mathematics - 11th International Conference, PPAM 2015. Revised Selected Papers, Part II, volume 9574 of Lecture Notes in Computer Science, pages 243-253. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-32152-3_23.
  11. Djamal Belazzougui, Dmitry Kosolobov, Simon J. Puglisi, and Rajeev Raman. Weighted ancestors in suffix trees revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021, LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  12. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Proceedings, volume 1776 of Lecture Notes in Computer Science, pages 88-94. Springer, 2000. URL: https://doi.org/10.1007/10719839_9.
  13. Omer Berkman and Uzi Vishkin. Recursive star-tree parallel data structure. SIAM J. Comput., 22(2):221-242, 1993. URL: https://doi.org/10.1137/0222017.
  14. Or Birenzwige, Shay Golan, and Ely Porat. Locally consistent parsing for text indexing in small space. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pages 607-626. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.37.
  15. Supaporn Chairungsee and Maxime Crochemore. Using minimal absent words to build phylogeny. Theor. Comput. Sci., 450:109-116, 2012. URL: https://doi.org/10.1016/j.tcs.2012.04.031.
  16. Panagiotis Charalampopoulos, Maxime Crochemore, Gabriele Fici, Robert Mercaş, and Solon P. Pissis. Alignment-free sequence comparison using absent words. Inf. Comput., 262:57-68, 2018. URL: https://doi.org/10.1016/j.ic.2018.06.002.
  17. Panagiotis Charalampopoulos, Maxime Crochemore, and Solon P. Pissis. On extended special factors of a word. In String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, volume 11147 of Lecture Notes in Computer Science, pages 131-138. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-00479-8_11.
  18. Panagiotis Charalampopoulos, Pawel Gawrychowski, Shay Mozes, and Oren Weimann. An almost optimal edit distance oracle. CoRR, abs/2103.03294, 2021. URL: http://arxiv.org/abs/2103.03294.
  19. Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszynski, Tomasz Walen, and Wiktor Zuba. Counting distinct patterns in internal dictionary matching. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pages 8:1-8:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.8.
  20. Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Internal dictionary matching. In 30th International Symposium on Algorithms and Computation, ISAAC 2019, volume 149 of LIPIcs, pages 22:1-22:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2019.22.
  21. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 978-989. IEEE, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00095.
  22. Maxime Crochemore, Alice Héliou, Gregory Kucherov, Laurent Mouchard, Solon P. Pissis, and Yann Ramusat. Absent words in a sliding window with applications. Inf. Comput., 270, 2020. URL: https://doi.org/10.1016/j.ic.2019.104461.
  23. Maxime Crochemore, Filippo Mignosi, and Antonio Restivo. Automata and forbidden words. Inf. Process. Lett., 67(3):111-117, 1998. URL: https://doi.org/10.1016/S0020-0190(98)00104-5.
  24. Maxime Crochemore, Filippo Mignosi, Antonio Restivo, and Sergio Salemi. Data compression using antidictionaries. Proceedings of the IEEE, 88(11):1756-1768, 2000. URL: https://doi.org/10.1109/5.892711.
  25. Martin Farach. Optimal suffix tree construction with large alphabets. In 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, pages 137-143. IEEE Computer Society, 1997. URL: https://doi.org/10.1109/SFCS.1997.646102.
  26. Martin Farach and S. Muthukrishnan. Perfect hashing for strings: Formalization and algorithms. In Combinatorial Pattern Matching, 7th Annual Symposium, CPM 1996, volume 1075 of Lecture Notes in Computer Science, pages 130-140. Springer, 1996. URL: https://doi.org/10.1007/3-540-61258-0_11.
  27. Sébastien Ferenczi. Complexity of sequences and dynamical systems. Discret. Math., 206(1-3):145-154, 1999. URL: https://doi.org/10.1016/S0012-365X(98)00400-2.
  28. Gabriele Fici and Pawel Gawrychowski. Minimal absent words in rooted and unrooted trees. In String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, volume 11811 of Lecture Notes in Computer Science, pages 152-161. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-32686-9_11.
  29. Gabriele Fici, Filippo Mignosi, Antonio Restivo, and Marinella Sciortino. Word assembly through minimal forbidden words. Theor. Comput. Sci., 359(1-3):214-230, 2006. URL: https://doi.org/10.1016/j.tcs.2006.03.006.
  30. Gabriele Fici, Antonio Restivo, and Laura Rizzo. Minimal forbidden factors of circular words. Theor. Comput. Sci., 792:144-153, 2019. URL: https://doi.org/10.1016/j.tcs.2018.05.037.
  31. Nathan J. Fine and Herbert S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: http://www.jstor.org/stable/2034009.
  32. Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. URL: https://doi.org/10.1016/0022-0000(93)90040-4.
  33. Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing DAWGs and minimal absent words in linear time for integer alphabets. In 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016, volume 58 of LIPIcs, pages 38:1-38:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.MFCS.2016.38.
  34. Arnab Ganguly, Manish Patil, Rahul Shah, and Sharma V. Thankachan. A linear space data structure for range LCP queries. Fundam. Inform., 163(3):245-251, 2018. URL: https://doi.org/10.3233/FI-2018-1741.
  35. Sara P. Garcia, Armando J. Pinho, João M. O. S. Rodrigues, Carlos A. C. Bastos, and Paulo J. S. G. Ferreira. Minimal absent words in prokaryotic and eukaryotic genomes. PLoS ONE, 6, 2011. URL: https://doi.org/10.1371/journal.pone.0016065.
  36. Pawel Gawrychowski, Moshe Lewenstein, and Patrick K. Nicholson. Weighted ancestors in suffix trees. In Algorithms - ESA 2014 - 22th Annual European Symposium, volume 8737 of Lecture Notes in Computer Science, pages 455-466. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-44777-2_38.
  37. Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338-355, 1984. URL: https://doi.org/10.1137/0213024.
  38. Guy Jacobson. Space-efficient static trees and graphs. In 30th Annual Symposium on Foundations of Computer Science, FOCS 1989, pages 549-554. IEEE Computer Society, 1989. URL: https://doi.org/10.1109/SFCS.1989.63533.
  39. Orgad Keller, Tsvi Kopelowitz, Shir Landau Feibish, and Moshe Lewenstein. Generalized substring compression. Theor. Comput. Sci., 525:42-54, 2014. URL: https://doi.org/10.1016/j.tcs.2013.10.010.
  40. Dominik Kempa and Tomasz Kociumaka. String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 756-767. ACM, 2019. URL: https://doi.org/10.1145/3313276.3316368.
  41. Tomasz Kociumaka. Minimal suffix and rotation of a substring in optimal time. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, pages 28:1-28:12, 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.28.
  42. Tomasz Kociumaka. Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw, 2018. URL: https://mimuw.edu.pl/~kociumaka/files/phd.pdf.
  43. Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. A linear-time algorithm for seeds computation. ACM Trans. Algorithms, 16(2):27:1-27:23, 2020. URL: https://doi.org/10.1145/3386369.
  44. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Efficient data structures for the factor periodicity problem. In String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, volume 7608 of Lecture Notes in Computer Science, pages 284-294, 2012. URL: https://doi.org/10.1007/978-3-642-34109-0_30.
  45. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Internal pattern matching queries in a text and applications. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 532-551. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.36.
  46. Tsvi Kopelowitz, Gregory Kucherov, Yakov Nekrich, and Tatiana Starikovskaya. Cross-document pattern matching. J. Discrete Algorithms, 24:40-47, 2014. URL: https://doi.org/10.1016/j.jda.2013.05.002.
  47. Tsvi Kopelowitz and Moshe Lewenstein. Dynamic weighted ancestors. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pages 565-574. SIAM, 2007. URL: http://dl.acm.org/citation.cfm?id=1283383.1283444.
  48. Gad M. Landau and Uzi Vishkin. Fast string matching with k differences. J. Comput. Syst. Sci., 37(1):63-78, 1988. URL: https://doi.org/10.1016/0022-0000(88)90045-1.
  49. Kotaro Matsuda, Kunihiko Sadakane, Tatiana Starikovskaya, and Masakazu Tateshita. Compressed orthogonal search on suffix arrays with applications to range LCP. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17-19, 2020, Copenhagen, Denmark, pages 23:1-23:13, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.23.
  50. Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Minimal unique substrings and minimal absent words in a sliding window. In 46th SOFSEM, volume 12011 of Lecture Notes in Computer Science, pages 148-160. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-38919-2_13.
  51. Filippo Mignosi, Antonio Restivo, and Marinella Sciortino. Words and forbidden factors. Theor. Comput. Sci., 273(1-2):99-117, 2002. URL: https://doi.org/10.1016/S0304-3975(00)00436-9.
  52. J. Ian Munro, Gonzalo Navarro, and Yakov Nekrich. Text indexing and searching in sublinear time. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pages 24:1-24:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.24.
  53. Gonzalo Navarro. Compact Data Structures - A Practical Approach. Cambridge University Press, 2016. URL: http://www.cambridge.org/de/academic/subjects/computer-science/algorithmics-complexity-computer-algebra-and-computational-g/compact-data-structures-practical-approach?format=HB.
  54. Takahiro Ota and Hiroyoshi Morita. On the adaptive antidictionary code using minimal forbidden words with constant lengths. In Proceedings of the International Symposium on Information Theory and its Applications, ISITA 2010, pages 72-77. IEEE, 2010. URL: https://doi.org/10.1109/ISITA.2010.5649621.
  55. Mihai Patrascu. Unifying the landscape of cell-probe lower bounds. SIAM J. Comput., 40(3):827-847, 2011. URL: https://doi.org/10.1137/09075336X.
  56. Diogo Pratas and Jorge M Silva. Persistent minimal sequences of SARS-CoV-2. Bioinformatics, July 2020. btaa686. URL: https://doi.org/10.1093/bioinformatics/btaa686.
  57. Mihai Pătraşcu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, pages 166-175. IEEE Computer Society, 2014. URL: https://doi.org/10.1109/FOCS.2014.26.
  58. Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, and Adam D. Smith. Sublinear algorithms for approximating string compressibility. Algorithmica, 65(3):685-709, 2013. URL: https://doi.org/10.1007/s00453-012-9618-6.
  59. Mikhail Rubinchik and Arseny M. Shur. Counting palindromes in substrings. In String Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, volume 10508 of Lecture Notes in Computer Science, pages 290-303. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-67428-5_25.
  60. Raquel M. Silva, Diogo Pratas, Luísa Castro, Armando J. Pinho, and Paulo J. S. G. Ferreira. Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinform., 31(15):2421-2425, 2015. URL: https://doi.org/10.1093/bioinformatics/btv189.
  61. Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Small-space LCE data structure with constant-time queries. In 42nd International Symposium on Mathematical Foundations of Computer Science, MFCS 2017, volume 83 of LIPIcs, pages 10:1-10:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.MFCS.2017.10.
  62. Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Math. Comput. Sci., 1(4):571-603, 2008. URL: https://doi.org/10.1007/s11786-007-0033-3.
  63. Andrew C. Yao. Space-time tradeoff for answering range queries (extended abstract). In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC 1982, pages 128-136. ACM, 1982. URL: https://doi.org/10.1145/800070.802185.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail