Patel, Dhrumil ;
Shah, Rahul
Inverse Suffix Array Queries for 2Dimensional Pattern Matching in NearCompact Space
Abstract
In a 2dimensional (2D) pattern matching problem, the text is arranged as a matrix 𝖬[1..n, 1..n] and consists of N = n × n symbols drawn from alphabet set Σ of size σ. The query consists of a m × m square matrix 𝖯[1..m, 1..m] drawn from the same alphabet set Σ and the task is to find all the locations in 𝖬 where 𝖯 appears as a (contiguous) submatrix. The patterns can be of any size, but as long as they are square in shape data structures like suffix trees and suffix array exist [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998] for the task of efficient pattern matching. These are essentially 2D counterparts of classic suffix trees and arrays known for traditional 1dimensional (1D) pattern matching. They work based on linearization of 2D suffixes which would preserve the prefix match property (i.e., every pattern match is a prefix of some suffix).
The main limitation of the suffix trees and the suffix arrays (in 1D) was their space utilization of O(N log N) bits, where N is the size of the text. This was suboptimal compared to Nlog σ bits of space, which is information theoretic optimal for the text. With the advent of the field of succinct/compressed data structures, it was possible to develop compressed variants of suffix trees and array based on BurrowsWheeler Tansform and LFmapping (or Φ function) [Roberto Grossi and Jeffrey Scott Vitter, 2005; Paolo Ferragina and Giovanni Manzini, 2005; Kunihiko Sadakane, 2007]. These data structures indeed achieve O(N log σ) bits of space or better. This gives rise to the question: analogous to 1D case, can we design a succinct or compressed index for 2D pattern matching? Can there be a 2D compressed suffix tree? Are there analogues of BurrowsWheeler Transform or LFmapping? The problem has been acknowledged for over a decade now and there have been a few attempts at applying Φ function [Ankur Gupta, 2004] and achieving entropy based compression [Veli Mäkinen and Gonzalo Navarro, 2008]. However, achieving the complexity breakthrough akin to 1D case has yet to be found.
In this paper, we still do not know how to answer suffix array queries in O(N log σ) bits of space  which would have led to efficient pattern matching. However, for the first time, we show an interesting result that it is indeed possible to compute inverse suffix array (ISA) queries in near compact space in O(polylog n) time. Our 2D succinct text index design is based on two 1D compressed suffix trees and it takes O(N log log N + N logσ) bits of space which is much smaller than its naive design that takes O(N log N) bits.
Although the main problem is still evasive, this index gives a hope on the existence of a full 2D succinct index with all functionalities similar to that of 1D case.
BibTeX  Entry
@InProceedings{patel_et_al:LIPIcs.ISAAC.2021.60,
author = {Patel, Dhrumil and Shah, Rahul},
title = {{Inverse Suffix Array Queries for 2Dimensional Pattern Matching in NearCompact Space}},
booktitle = {32nd International Symposium on Algorithms and Computation (ISAAC 2021)},
pages = {60:160:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772143},
ISSN = {18688969},
year = {2021},
volume = {212},
editor = {Ahn, HeeKap and Sadakane, Kunihiko},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/15493},
URN = {urn:nbn:de:0030drops154932},
doi = {10.4230/LIPIcs.ISAAC.2021.60},
annote = {Keywords: Pattern Matching, Succinct Data Structures}
}
30.11.2021
Keywords: 

Pattern Matching, Succinct Data Structures 
Seminar: 

32nd International Symposium on Algorithms and Computation (ISAAC 2021)

Issue date: 

2021 
Date of publication: 

30.11.2021 