eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2020-06-09
7:1
7:14
10.4230/LIPIcs.CPM.2020.7
article
String Sanitization Under Edit Distance
Bernardini, Giulia
1
https://orcid.org/0000-0001-6647-088X
Chen, Huiping
2
Loukides, Grigorios
2
https://orcid.org/0000-0003-0888-5061
Pisanti, Nadia
3
4
https://orcid.org/0000-0003-3915-7665
Pissis, Solon P.
5
6
4
https://orcid.org/0000-0002-1445-1932
Stougie, Leen
5
6
4
Sweering, Michelle
5
University of Milano - Bicocca, Milan, Italy
King’s College London, UK
University of Pisa, Italy
ERABLE Team, Lyon, France
CWI, Amsterdam, The Netherlands
Vrije Universiteit, Amsterdam, The Netherlands
Let W be a string of length n over an alphabet Σ, k be a positive integer, and 𝒮 be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of 𝒮 occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and 𝒮 represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in 𝒪(kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in 𝒪(n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol161-cpm2020/LIPIcs.CPM.2020.7/LIPIcs.CPM.2020.7.pdf
String algorithms
data sanitization
edit distance
dynamic programming
conditional lower bound