Let W be a string of length n over an alphabet Σ, k be a positive integer, and 𝒮 be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of 𝒮 occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and 𝒮 represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in 𝒪(kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in 𝒪(n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.
@InProceedings{bernardini_et_al:LIPIcs.CPM.2020.7, author = {Bernardini, Giulia and Chen, Huiping and Loukides, Grigorios and Pisanti, Nadia and Pissis, Solon P. and Stougie, Leen and Sweering, Michelle}, title = {{String Sanitization Under Edit Distance}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {7:1--7:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.7}, URN = {urn:nbn:de:0030-drops-121324}, doi = {10.4230/LIPIcs.CPM.2020.7}, annote = {Keywords: String algorithms, data sanitization, edit distance, dynamic programming, conditional lower bound} }
Feedback for Dagstuhl Publishing