Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence

Authors Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu



PDF
Thumbnail PDF

File

LIPIcs.WABI.2021.2.pdf
  • Filesize: 1.13 MB
  • 23 pages

Document Identifiers

Author Details

Yoshiki Nakagawa
  • Department of Computer Science and Engineering, Waseda University, Tokyo, Japan
Satsuya Ohata
  • Digital Garage, Inc., Tokyo, Japan
Kana Shimizu
  • Department of Computer Science and Engineering, Waseda University, Tokyo, Japan
  • National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

Cite AsGet BibTex

Yoshiki Nakagawa, Satsuya Ohata, and Kana Shimizu. Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 2:1-2:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.WABI.2021.2

Abstract

Finding a similar substring that commonly appears in query and database sequences is an essential task for genome data analysis. This study proposes a secure two-party variable-length string search protocol based on secret sharing. The unique feature of our protocol is that time, communication, and round complexities are not dependent on the database length N, after the query input. This property brings dramatic performance improvements in search time, since N is usually quite large in an actual genome database, and the same database is repeatedly used for many queries. Our concept hinges on a technique that efficiently applies the compressed full-text index (FOCS 2000) for a secret-sharing scheme. We conducted an experiment using a human genomic sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a well-designed baseline protocol under the realistic computation/network environment.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Security and privacy → Privacy-preserving protocols
  • Theory of computation → Cryptographic protocols
  • Applied computing → Genomics
Keywords
  • Private Genome Sequence Search
  • Secure Multiparty Computation
  • Secret Sharing
  • FM-index
  • Suffix Tree
  • Maximal Exact Match

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Toshinori Araki, Jun Furukawa, Yehuda Lindell, Ariel Nof, and Kazuma Ohara. High-throughput semi-honest secure three-party computation with an honest majority. In Proc. of CCS 2016, pages 805-817, 2016. URL: https://doi.org/10.1145/2976749.2978331.
  2. Gilad Asharov, Shai Halevi, Yehuda Lindell, and Tal Rabin. Privacy-preserving search of similar patients in genomic data. PoPETs, 2018(4):104-124, 2018. URL: https://doi.org/10.1515/popets-2018-0034.
  3. Md Momin Al Aziz, Md. Nazmus Sadat, Dima Alhadidi, Shuang Wang, Xiaoqian Jiang, Cheryl L. Brown, and Noman Mohammed. Privacy-preserving techniques of genomic data - a survey. Briefings Bioinform., 20(3):887-895, 2019. Google Scholar
  4. Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. Countering GATTACA: efficient and secure testing of fully-sequenced human genomes. In Proc. of CCS 2011, pages 691-702, 2011. URL: https://doi.org/10.1145/2046707.2046785.
  5. Donald Beaver. Efficient multiparty protocols using circuit randomization. In Proc. of CRYPTO 1991, pages 420-432, 1991. URL: https://doi.org/10.1007/3-540-46766-1_34.
  6. Yangyi Chen, Bo Peng, XiaoFeng Wang, and Haixu Tang. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In Proc. of NDSS 2012, 2012. URL: https://www.ndss-symposium.org/ndss2012/large-scale-privacy-preserving-mapping-human-genomic-sequences-hybrid-clouds.
  7. Ke Cheng, Yantian Hou, and Liangmin Wang. Secure similar sequence query on outsourced genomic data. In Proc. of AsiaCCS 2018, pages 237-251, 2018. URL: https://doi.org/10.1145/3196494.3196535.
  8. Jung Hee Cheon, Miran Kim, and Kristin E. Lauter. Homomorphic computation of edit distance. In Proc. of FC 2015, pages 194-212, 2015. URL: https://doi.org/10.1007/978-3-662-48051-9_15.
  9. Richard Durbin. Efficient haplotype matching and storage using the positional burrows-wheeler transform (pbwt). Bioinformatics, 30(9):1266-1272, 2014. Google Scholar
  10. Yaniv Erlich and Arvind Narayanan. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics, 15(6):409-421, 2014. Google Scholar
  11. Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Proc. of FOCS 2000, pages 390-398, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
  12. Johannes Fischer, Veli Mäkinen, and Gonzalo Navarro. An(other) entropy-bounded compressed suffix tree. In Proc. of CPM 2008, pages 152-165, 2008. URL: https://doi.org/10.1007/978-3-540-69068-9_16.
  13. Marc Fiume, Miroslav Cupak, Stephen Keenan, Jordi Rambla, Sabela de la Torre, Stephanie OM Dyke, Anthony J Brookes, Knox Carey, David Lloyd, Peter Goodhand, et al. Federated discovery and sharing of genomic data using beacons. Nature biotechnology, 37(3):220-224, 2019. Google Scholar
  14. Oded Goldreich. The Foundations of Cryptography - Volume 2, Basic Applications. Cambridge University Press, 2004. URL: https://doi.org/10.1017/CBO9780511721656.
  15. Yan Huang, David Evans, Jonathan Katz, and Lior Malka. Faster secure two-party computation using garbled circuits. In Proc. of USENIX 2011, 2011. URL: http://static.usenix.org/events/sec11/tech/full_papers/Huang.pdf.
  16. Y. Ishimaki, H. Imabayashi, K. Shimizu, and H. Yamana. Privacy-preserving string search for genome sequences with fhe bootstrapping optimization. In Proc. of IEEE Big Data 2016, pages 3989-3991, 2016. Google Scholar
  17. Somesh Jha, Louis Kruger, and Vitaly Shmatikov. Towards practical privacy for genomic computation. In Proc. of IEEE S&P 2000, pages 216-230, 2008. URL: https://doi.org/10.1109/SP.2008.34.
  18. Md Safiur Rahman Mahdi, Md Momin Al Aziz, Noman Mohammed, and Xiaoqian Jiang. Privacy-preserving string search on encrypted genomic data using a generalized suffix tree. Informatics in Medicine Unlocked, 23:100525, 2021. Google Scholar
  19. Payman Mohassel, Ostap Orobets, and Ben Riva. Efficient server-aided 2pc for mobile phones. PoPETs, 2016(2):82-99, 2016. URL: https://doi.org/10.1515/popets-2016-0006.
  20. Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacy-preserving machine learning. In Proc. of IEEE S&P 2017, pages 19-38, 2017. URL: https://doi.org/10.1109/SP.2017.12.
  21. Muhammad Naveed, Erman Ayday, Ellen W Clayton, Jacques Fellay, Carl A Gunter, Jean-Pierre Hubaux, Bradley A Malin, and XiaoFeng Wang. Privacy in the genomic era. ACM Computing Surveys (CSUR), 48(1):1-44, 2015. Google Scholar
  22. Koji Nuida, Satsuya Ohata, Shigeo Mitsunari, and Nuttapong Attrapadung. Arbitrary univariate function evaluation and re-encryption protocols over lifted-elgamal type ciphertexts. IACR Cryptology ePrint Archive, 2019:1233, 2019. URL: https://eprint.iacr.org/2019/1233.
  23. Satsuya Ohata and Koji Nuida. Communication-efficient (client-aided) secure two-party protocols and its application. In proc. of FC 2020, pages 369-385, 2020. Google Scholar
  24. Anthony A Philippakis, Danielle R Azzariti, Sergi Beltran, Anthony J Brookes, Catherine A Brownstein, Michael Brudno, Han G Brunner, Orion J Buske, Knox Carey, Cassie Doll, et al. The matchmaker exchange: a platform for rare disease gene discovery. Human mutation, 36(10):915-921, 2015. Google Scholar
  25. Victoria Popic and Serafim Batzoglou. A hybrid cloud read aligner based on minhash and kmer voting that preserves privacy. Nature communications, 8(1):1-7, 2017. Google Scholar
  26. Thomas Schneider and Oleksandr Tkachenko. EPISODE: efficient privacy-preserving similar sequence queries on outsourced genomic databases. In Proc. of AsiaCCS 2019, pages 315-327, 2019. URL: https://doi.org/10.1145/3321705.3329800.
  27. Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612-613, 1979. Google Scholar
  28. Kana Shimizu, Koji Nuida, and Gunnar Rätsch. Efficient privacy-preserving string search and an application in genomics. Bioinformatics, 32(11):1652-1661, 2016. URL: https://doi.org/10.1093/bioinformatics/btw050.
  29. Katerina Sotiraki, Esha Ghosh, and Hao Chen. Privately computing set-maximal matches in genomic data. BMC Medical Genomics, 13(7):1-8, 2020. Google Scholar
  30. H. Sudo, M. Jimbo, K. Nuida, and K. Shimizu. Secure wavelet matrix: Alphabet-friendly privacy-preserving string search for bioinformatics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(5):1675-1684, 2019. Google Scholar
  31. Xiao Shaun Wang, Yan Huang, Yongan Zhao, Haixu Tang, XiaoFeng Wang, and Diyue Bu. Efficient genome-wide, privacy-preserving similar patient query based on private edit distance. In Proc. of CCS 2015, pages 492-503, 2015. URL: https://doi.org/10.1145/2810103.2813725.
  32. Masaya Yasuda, Takeshi Shimoyama, Jun Kogure, Kazuhiro Yokoyama, and Takeshi Koshiba. Secure pattern matching using somewhat homomorphic encryption. In Ari Juels and Bryan Parno, editors, Proc. of CCSW'13, pages 65-76, 2013. URL: https://doi.org/10.1145/2517488.2517497.
  33. R. Zhu and Y. Huang. Efficient and precise secure generalized edit distance and beyond. IEEE Transactions on Dependable and Secure Computing, pages 1-1, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail