License: Creative Commons Attribution 3.0 Unported license (CC-BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.7
URN: urn:nbn:de:0030-drops-121324
URL: https://drops.dagstuhl.de/opus/volltexte/2020/12132/
Go to the corresponding LIPIcs Volume Portal


Bernardini, Giulia ; Chen, Huiping ; Loukides, Grigorios ; Pisanti, Nadia ; Pissis, Solon P. ; Stougie, Leen ; Sweering, Michelle

String Sanitization Under Edit Distance

pdf-format:
LIPIcs-CPM-2020-7.pdf (0.8 MB)


Abstract

Let W be a string of length n over an alphabet Σ, k be a positive integer, and 𝒮 be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of 𝒮 occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and 𝒮 represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in 𝒪(kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in 𝒪(n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.

BibTeX - Entry

@InProceedings{bernardini_et_al:LIPIcs:2020:12132,
  author =	{Giulia Bernardini and Huiping Chen and Grigorios Loukides and Nadia Pisanti and Solon P. Pissis and Leen Stougie and Michelle Sweering},
  title =	{{String Sanitization Under Edit Distance}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{7:1--7:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{Inge Li G{\o}rtz and Oren Weimann},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12132},
  URN =		{urn:nbn:de:0030-drops-121324},
  doi =		{10.4230/LIPIcs.CPM.2020.7},
  annote =	{Keywords: String algorithms, data sanitization, edit distance, dynamic programming, conditional lower bound}
}

Keywords: String algorithms, data sanitization, edit distance, dynamic programming, conditional lower bound
Collection: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Issue Date: 2020
Date of publication: 09.06.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI