Differentially Private Approximate Pattern Matching

Steiner, Teresa Anna

doi:10.4230/LIPIcs.ITCS.2024.94

Abstract

Differential privacy is the de facto privacy standard in data analysis and widely researched in various application areas. On the other hand, analyzing sequences, or strings, is essential to many modern data analysis tasks, and those data often include highly sensitive personal data. While the problem of sanitizing sequential data to protect privacy has received growing attention, there is a surprising lack of theoretical studies of algorithms analyzing sequential data that preserve differential privacy while giving provable guarantees on the accuracy of such an algorithm. The goal of this paper is to initiate such a study. Specifically, in this paper, we consider the k-approximate pattern matching problem under differential privacy, where the goal is to report or count all substrings of a given string S which have a Hamming distance at most k to a pattern P, or decide whether such a substring exists. In our definition of privacy, individual positions of the string S are protected. To be able to answer queries under differential privacy, we allow some slack on k, i.e. we allow reporting or counting substrings of S with a distance at most (1+γ)k+α to P, for a multiplicative error γ and an additive error α. We analyze which values of α and γ are necessary or sufficient to solve the k-approximate pattern matching problem while satisfying ε-differential privacy. Let n denote the length of S. We give - an ε-differentially private algorithm with an additive error of O(ε^{-1}log n) and no multiplicative error for the existence variant; - an ε-differentially private algorithm with an additive error O(ε^{-1}max(k,log n)⋅log n) for the counting variant; - an ε-differentially private algorithm with an additive error of O(ε^{-1}log n) and multiplicative error O(1) for the reporting variant for a special class of patterns. The error bounds hold with high probability. All of these algorithms return a witness, that is, if there exists a substring of S with distance at most k to P, then the algorithm returns a substring of S with distance at most (1+γ)k+α to P. Further, we complement these results by a lower bound, showing that any algorithm for the existence variant which also returns a witness must have an additive error of Ω(ε^{-1}log n) with constant probability.

Oluwole I. Ajala, Hayam Alamro, Costas S. Iliopoulos, and Grigorios Loukides. Towards string sanitization. In Proc. 14th AIAI (Workshops), pages 200-210, 2018. URL: https://doi.org/10.1007/978-3-319-92016-0_19.
Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone, and Michelle Sweering. Combinatorial algorithms for string sanitization. ACM Trans. Knowl. Discov. Data, 15(1):8:1-8:34, 2021. URL: https://doi.org/10.1145/3418683.
Giulia Bernardini, Alessio Conte, Garance Gourdel, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Giulia Punzi, Leen Stougie, and Michelle Sweering. Hide and mine in strings: Hardness, algorithms, and experiments. IEEE Trans. Knowl. Data Eng., 35(6):5948-5963, 2023. URL: https://doi.org/10.1109/TKDE.2022.3158063.
Giulia Bernardini, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Approximate pattern matching on elastic-degenerate text. Theor. Comput. Sci., 812:109-122, 2020. URL: https://doi.org/10.1016/j.tcs.2019.08.012.
Raghav Bhaskar, Srivatsan Laxman, Adam D. Smith, and Abhradeep Thakurta. Discovering frequent patterns in sensitive data. In Proc. 16th SIGKDD, pages 503-512, 2010. URL: https://doi.org/10.1145/1835804.1835869.
Luca Bonomi and Li Xiong. A two-phase algorithm for mining sequential patterns with differential privacy. In Proc. 22nd CIKM, pages 269-278, 2013. URL: https://doi.org/10.1145/2505515.2505553.
Luca Bonomi, Li Xiong, Rui Chen, and Benjamin C. M. Fung. Frequent grams based embedding for privacy preserving record linkage. In Proc. 21st CIKM, pages 1597-1601, 2012. URL: https://doi.org/10.1145/2396761.2398480.
Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In Proc. 61st FOCS, pages 978-989, 2020.
Rui Chen, Gergely Ács, and Claude Castelluccia. Differentially private sequential data publication via variable-length n-grams. In Proc. 19th CCS, pages 638-649, 2012. URL: https://doi.org/10.1145/2382196.2382263.
Rui Chen, Benjamin C. M. Fung, Bipin C. Desai, and Nériah M. Sossou. Differentially private transit data publication: a case study on the montreal transportation system. In Proc. 18th KDD, pages 213-221, 2012. URL: https://doi.org/10.1145/2339530.2339564.
Rui Chen, Benjamin C. M. Fung, Noman Mohammed, Bipin C. Desai, and Ke Wang. Privacy-preserving trajectory data publishing by local suppression. Inf. Sci., 231:83-97, 2013. URL: https://doi.org/10.1016/j.ins.2011.07.035.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Proc. 3rd TCC, volume 3876, pages 265-284, 2006. URL: https://doi.org/10.1007/11681878_14.
Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Leonard J. Schulman, editor, Proc. 42nd STOC, pages 715-724, 2010.
Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil P. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proc. 41st STOC, pages 381-390, 2009. URL: https://doi.org/10.1145/1536414.1536467.
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211-407, 2014.
Hendrik Fichtenberger, Monika Henzinger, and Jalaj Upadhyay. Constant matters: Fine-grained error bound on differentially private continual observation. In Proc. 40th ICML, 2023.
Pawel Gawrychowski and Przemyslaw Uznanski. Towards unified approximate pattern matching for hamming and l_1 distance. In Proc. 45th ICALP, pages 62:1-62:13, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
Xi He, Graham Cormode, Ashwin Machanavajjhala, Cecilia M. Procopiuc, and Divesh Srivastava. DPT: differentially private trajectory synthesis using hierarchical reference systems. Proc. VLDB Endow., 8(11):1154-1165, 2015. URL: http://www.vldb.org/pvldb/vol8/p1154-he.pdf.
Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, and Sergey Yekhanin. Differentially private n-gram extraction. In Proc. 34th NeurIPS, pages 5102-5111, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/28ce9bc954876829eeb56ff46da8e1ab-Abstract.html.
Elahe Ghasemi Komishani, Mahdi Abadi, and Fatemeh Deldar. PPTD: preserving personalized privacy in trajectory data publishing by sensitive attribute generalization and trajectory local suppression. Knowl. Based Syst., 94:43-59, 2016. URL: https://doi.org/10.1016/j.knosys.2015.11.007.
Yanhui Li, Guoren Wang, Ye Yuan, Xin Cao, Long Yuan, and Xuemin Lin. Privts: Differentially private frequent time-constrained sequential pattern mining. In Proc. 23rd DASFAA, pages 92-111, 2018. URL: https://doi.org/10.1007/978-3-319-91458-9_6.
Md Safiur Rahman Mahdi, Md Momin Al Aziz, Noman Mohammed, and Xiaoqian Jiang. Privacy-preserving string search on encrypted genomic data using a generalized suffix tree. Informatics in Medicine Unlocked, 23:100525, 2021.
Nicholas Mainardi, Alessandro Barenghi, and Gerardo Pelosi. Privacy preserving substring search protocol with polylogarithmic communication cost. In Proc. 35th ACSAC, pages 297-312, 2019. URL: https://doi.org/10.1145/3359789.3359842.
Mihai Maruseac and Gabriel Ghinita. Differentially-private mining of representative travel patterns. In Proc. 17th MDM, pages 272-281, 2016. URL: https://doi.org/10.1109/MDM.2016.48.
Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Commun. ACM, 53(9):89-97, 2010. URL: https://doi.org/10.1145/1810891.1810916.
Gonzalo Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1):31-88, 2001. URL: https://doi.org/10.1145/375360.375365.
Shiyue Qin, Fucai Zhou, Zongye Zhang, and Zifeng Xu. Privacy-preserving substring search on multi-source encrypted gene data. IEEE Access, 8:50472-50484, 2020. URL: https://doi.org/10.1109/ACCESS.2020.2980375.
Kana Shimizu, Koji Nuida, and Gunnar Rätsch. Efficient privacy-preserving string search and an application in genomics. Bioinform., 32(11):1652-1661, 2016. URL: https://doi.org/10.1093/bioinformatics/btw050.
Tatiana Starikovskaya. Communication and streaming complexity of approximate pattern matching. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, Proc. 28th CPM, pages 13:1-13:11, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.13.
Hiroki Sudo, Masanobu Jimbo, Koji Nuida, and Kana Shimizu. Secure wavelet matrix: Alphabet-friendly privacy-preserving string search for bioinformatics. IEEE ACM Trans. Comput. Biol. Bioinform., 16(5):1675-1684, 2019. URL: https://doi.org/10.1109/TCBB.2018.2814039.
Juan Ramón Troncoso-Pastoriza, Stefan Katzenbeisser, and Mehmet Utku Celik. Privacy preserving error resilient dna searching through oblivious automata. In Proc. 14th CCS, pages 519-528, 2007. URL: https://doi.org/10.1145/1315245.1315309.
Sirintra Vaiwsri, Thilina Ranbaduge, and Peter Christen. Accurate and efficient privacy-preserving string matching. Int. J. Data Sci. Anal., 14(2):191-215, 2022. URL: https://doi.org/10.1007/s41060-022-00320-5.
Zhibo Wang, Wenxin Liu, Xiaoyi Pang, Ju Ren, Zhe Liu, and Yongle Chen. Towards pattern-aware privacy-preserving real-time data collection. In Proc. 39th INFOCOM, pages 109-118, 2020. URL: https://doi.org/10.1109/INFOCOM41043.2020.9155290.
Jianhao Wei, Yaping Lin, Xin Yao, Jin Zhang, and Xinbo Liu. Differential privacy-based genetic matching in personalized medicine. IEEE Trans. Emerg. Top. Comput., 9(3):1109-1125, 2021. URL: https://doi.org/10.1109/TETC.2020.2970094.
Xiaochao Wei, Minghao Zhao, and Qiuliang Xu. Efficient and secure outsourced approximate pattern matching protocol. Soft Comput., 22(4):1175-1187, 2018. URL: https://doi.org/10.1007/s00500-017-2560-4.
Xinyu Yang, Teng Wang, Xuebin Ren, and Wei Yu. Survey on improving data utility in differentially private sequential data publishing. IEEE Trans. Big Data, 7(4):729-749, 2021. URL: https://doi.org/10.1109/TBDATA.2017.2715334.
Jun Zhang, Xiaokui Xiao, and Xing Xie. Privtree: A differentially private algorithm for hierarchical decompositions. In Proc. ACM SIGMOD, pages 155-170, 2016. URL: https://doi.org/10.1145/2882903.2882928.
Peng Zhang and Mikhail J. Atallah. On approximate pattern matching with thresholds. Inf. Process. Lett., 123:21-26, 2017. URL: https://doi.org/10.1016/j.ipl.2017.03.001.

Differentially Private Approximate Pattern Matching

Author Teresa Anna Steiner

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Differentially Private Approximate Pattern Matching

Author Teresa Anna Steiner

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message