Document

Constructing Strings Avoiding Forbidden Substrings

File

LIPIcs.CPM.2021.9.pdf
• Filesize: 0.89 MB
• 18 pages

Acknowledgements

We wish to thank Gabriele Fici (Università di Palermo) for bringing to our attention the work of Crochemore, Mignosi and Restivo [Maxime Crochemore et al., 1998].

Cite As

Giulia Bernardini, Alberto Marchetti-Spaccamela, Solon P. Pissis, Leen Stougie, and Michelle Sweering. Constructing Strings Avoiding Forbidden Substrings. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 9:1-9:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.9

Abstract

We consider the problem of constructing strings over an alphabet Σ that start with a given prefix u, end with a given suffix v, and avoid occurrences of a given set of forbidden substrings. In the decision version of the problem, given a set S_k of forbidden substrings, each of length k, over Σ, we are asked to decide whether there exists a string x over Σ such that u is a prefix of x, v is a suffix of x, and no s ∈ S_k occurs in x. Our first result is an 𝒪(|u|+|v|+k|S_k|)-time algorithm to decide this problem. In the more general optimization version of the problem, given a set S of forbidden arbitrary-length substrings over Σ, we are asked to construct a shortest string x over Σ such that u is a prefix of x, v is a suffix of x, and no s ∈ S occurs in x. Our second result is an 𝒪(|u|+|v|+||S||⋅|Σ|)-time algorithm to solve this problem, where ||S|| denotes the total length of the elements of S. Interestingly, our results can be directly applied to solve the reachability and shortest path problems in complete de Bruijn graphs in the presence of forbidden edges or of forbidden paths. Our algorithms are motivated by data privacy, and in particular, by the data sanitization process. In the context of strings, sanitization consists in hiding forbidden substrings from a given string by introducing the least amount of spurious information. We consider the following problem. Given a string w of length n over Σ, an integer k, and a set S_k of forbidden substrings, each of length k, over Σ, construct a shortest string y over Σ such that no s ∈ S_k occurs in y and the sequence of all other length-k fragments occurring in w is a subsequence of the sequence of the length-k fragments occurring in y. Our third result is an 𝒪(nk|S_k|⋅|Σ|)-time algorithm to solve this problem.

Subject Classification

ACM Subject Classification
• Theory of computation → Pattern matching
Keywords
• string algorithms
• forbidden strings
• de Bruijn graphs
• data sanitization

Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

References

1. Osman Abul, Francesco Bonchi, and Fosca Giannotti. Hiding sequential and spatiotemporal patterns. IEEE Trans. Knowl. Data Eng., 22(12):1709-1723, 2010. URL: https://doi.org/10.1109/TKDE.2009.213.
2. Osman Abul and Harun Gökçe. Knowledge hiding from tree and graph databases. Data Knowl. Eng., 72:148-171, 2012. URL: https://doi.org/10.1016/j.datak.2011.10.002.
3. Surender Baswana, Keerti Choudhary, Moazzam Hussain, and Liam Roditty. Approximate single-source fault tolerant shortest path. ACM Trans. Algorithms, 16(4):44:1-44:22, 2020. URL: https://doi.org/10.1145/3397532.
4. Surender Baswana, Keerti Choudhary, and Liam Roditty. Fault tolerant reachability for directed graphs. In DISC, pages 528-543, 2015. URL: https://doi.org/10.1007/978-3-662-48653-5_35.
5. Surender Baswana, Keerti Choudhary, and Liam Roditty. Fault-tolerant subgraph for single-source reachability: General and optimal. SIAM J. Comput., 47(1):80-95, 2018. URL: https://doi.org/10.1137/16M1087643.
6. Surender Baswana and Neelesh Khanna. Approximate shortest paths avoiding a failed vertex: Near optimal data structures for undirected unweighted graphs. Algorithmica, 66(1):18-50, 2013. URL: https://doi.org/10.1007/s00453-012-9621-y.
7. Surender Baswana, Utkarsh Lath, and Anuradha S. Mehta. Single source distance oracle for planar digraphs avoiding a failed node or link. In SODA, pages 223-232, 2012. URL: https://doi.org/10.1137/1.9781611973099.20.
8. Marie-Pierre Béal, Maxime Crochemore, Filippo Mignosi, Antonio Restivo, and Marinella Sciortino. Computing forbidden words of regular languages. Fundam. Informaticae, 56(1-2):121-135, 2003. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi56-1-2-08.
9. Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. String sanitization: A combinatorial approach. In ECML PKDD, volume 11906, pages 627-644, 2019. URL: https://doi.org/10.1007/978-3-030-46150-8_37.
10. Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone, and Michelle Sweering. Combinatorial algorithms for string sanitization. ACM Trans. Knowl. Discov. Data, 15(1):8:1-8:34, 2020. URL: https://doi.org/10.1145/3418683.
11. Giulia Bernardini, Huiping Chen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Leen Stougie, and Michelle Sweering. String sanitization under edit distance. In CPM, pages 7:1-7:14, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.7.
12. Giulia Bernardini, Alessio Conte, Garance Gourdel, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon Pissis, Giulia Punzi, Leen Stougie, and Michelle Sweering. Hide and mine in strings: Hardness and algorithms. In ICDM, pages 924-929, 2020. URL: https://doi.org/10.1109/ICDM50108.2020.00103.
13. Luca Bonomi, Liyue Fan, and Hongxia Jin. An information-theoretic approach to individual sequential data sanitization. In WSDM, pages 337-346, 2016. URL: https://doi.org/10.1145/2835776.2835828.
14. Andrei Z. Broder, Danny Dolev, Michael J. Fischer, and Barbara Simons. Efficient fault-tolerant routings in networks. Inf. Comput., 75(1):52-64, 1987. URL: https://doi.org/10.1016/0890-5401(87)90063-0.
15. Pascal Caron. Families of locally testable languages. Theoretical Computer Science, 242(1):361-376, 2000. URL: https://doi.org/10.1016/S0304-3975(98)00332-6.
16. Panagiotis Charalampopoulos, Shay Mozes, and Benjamin Tebeka. Exact distance oracles for planar graphs with failing vertices. In SODA, pages 2110-2123, 2019. URL: https://doi.org/10.1137/1.9781611975482.127.
17. Keerti Choudhary. An optimal dual fault tolerant reachability oracle. In ICALP, pages 130:1-130:13, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.130.
18. Chris Clifton and Don Marks. Security and privacy implications of data mining. In SIGMOD, pages 15-19, 1996.
19. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
20. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007.
21. Maxime Crochemore, Filippo Mignosi, and Antonio Restivo. Automata and forbidden words. Inf. Process. Lett., 67(3):111-117, 1998. URL: https://doi.org/10.1016/S0020-0190(98)00104-5.
22. Nicolaas Govert de Bruijn. A combinatorial problem. Koninklijke Nederlandse Akademie V. Wetenschappen, 49:758-764, 1946.
23. C. Delorme and J.-P. Tillich. The spectrum of de Bruijn and Kautz graphs. European Journal of Combinatorics, 19(3):307-319, 1998. URL: https://doi.org/10.1006/eujc.1997.0183.
24. Martin Dietzfelbinger and Friedhelm Meyer auf der Heide. A new universal class of hash functions and dynamic hashing in real time. In ICALP, pages 6-19, 1990. URL: https://doi.org/10.1007/BFb0032018.
25. Igor Dolinka. On free spectra of locally testable semigroup varieties. Glasgow Mathematical Journal, 53(3):623-629, 2011. URL: https://doi.org/10.1017/S0017089511000188.
26. Martin Farach. Optimal suffix tree construction with large alphabets. In FOCS, pages 137-143, 1997. URL: https://doi.org/10.1109/SFCS.1997.646102.
27. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: https://doi.org/10.1145/828.1884.
28. Aris Gkoulalas-Divanis and Grigorios Loukides. Revisiting sequential pattern hiding to enhance utility. In SIGKDD, pages 1316-1324, 2011. URL: https://doi.org/10.1145/2020408.2020605.
29. Aris Gkoulalas-Divanis and Vassilios S. Verykios. An integer programming approach for frequent itemset hiding. In CIKM, pages 748-757, 2006. URL: https://doi.org/10.1145/1183614.1183721.
30. Aris Gkoulalas-Divanis and Vassilios S. Verykios. Exact knowledge hiding through database extension. IEEE Trans. Knowl. Data Eng., 21(5):699-713, 2009. URL: https://doi.org/10.1109/TKDE.2008.199.
31. Robert Gwadera, Aris Gkoulalas-Divanis, and Grigorios Loukides. Permutation-based sequential pattern hiding. In ICDM, pages 241-250, 2013. URL: https://doi.org/10.1109/ICDM.2013.57.
32. Giuseppe F. Italiano, Adam Karczmarz, and Nikos Parotsidis. Planar reachability under single vertex or edge failures, 2021. URL: https://doi.org/10.1137/1.9781611976465.163.
33. L.Ro Ford Jr. A cyclic arrangement of m-tuples. Technical Report Report P-1071, Rand Corporation, 1957.
34. Tiko Kameda. On the vector representation of the reachability in planar directed graphs. Inf. Process. Lett., 3(3):75-77, 1975. URL: https://doi.org/10.1016/0020-0190(75)90019-8.
35. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249-260, 1987. URL: https://doi.org/10.1147/rd.312.0249.
36. Young-Jin Kim, Ramesh Govindan, Brad Karp, and Scott Shenker. Geographic routing made practical. In NSDI, 2005. URL: http://www.usenix.org/events/nsdi05/tech/kim.html.
37. Grigorios Loukides and Robert Gwadera. Optimal event sequence sanitization. In ICDM, pages 775-783, 2015. URL: https://doi.org/10.1137/1.9781611974010.87.
38. Stanley R. M. Oliveira and Osmar R. Zaïane. Protecting sensitive knowledge by data sanitization. In ICDM, pages 613-616, 2003. URL: https://doi.org/10.1109/ICDM.2003.1250990.
39. Steven Skiena. The Algorithm Design Manual, Third Edition. Texts in Computer Science. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-54256-6.
40. Mikkel Thorup. Compact oracles for reachability and approximate distances in planar digraphs. J. ACM, 51(6):993-1024, 2004. URL: https://doi.org/10.1145/1039488.1039493.
41. Esko Ukkonen. Approximate string matching with q-grams and maximal matches. Theor. Comput. Sci., 92(1):191-211, 1992. URL: https://doi.org/10.1016/0304-3975(92)90143-4.
42. Vassilios S. Verykios, Ahmed K. Elmagarmid, Elisa Bertino, Yücel Saygin, and Elena Dasseni. Association rule hiding. IEEE Trans. Knowl. Data Eng., 16(4):434-447, 2004. URL: https://doi.org/10.1109/TKDE.2004.1269668.
43. Yi-Hung Wu, Chia-Ming Chiang, and Arbee L. P. Chen. Hiding sensitive association rules with limited side effects. IEEE Trans. Knowl. Data Eng., 19(1):29-42, 2007. URL: https://doi.org/10.1109/TKDE.2007.250583.