Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence

Nakagawa, Yoshiki; Ohata, Satsuya; Shimizu, Kana

doi:10.4230/LIPIcs.WABI.2021.2

File

Subject Classification

ACM Subject Classification

Theory of computation → Pattern matching
Security and privacy → Privacy-preserving protocols
Theory of computation → Cryptographic protocols
Applied computing → Genomics

Keywords

Private Genome Sequence Search
Secure Multiparty Computation
Secret Sharing
FM-index
Suffix Tree
Maximal Exact Match

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Finding a similar substring that commonly appears in query and database sequences is an essential task for genome data analysis. This study proposes a secure two-party variable-length string search protocol based on secret sharing. The unique feature of our protocol is that time, communication, and round complexities are not dependent on the database length N, after the query input. This property brings dramatic performance improvements in search time, since N is usually quite large in an actual genome database, and the same database is repeatedly used for many queries. Our concept hinges on a technique that efficiently applies the compressed full-text index (FOCS 2000) for a secret-sharing scheme. We conducted an experiment using a human genomic sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a well-designed baseline protocol under the realistic computation/network environment.

Cite As Get BibTex

Yoshiki Nakagawa, Satsuya Ohata, and Kana Shimizu. Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 2:1-2:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.WABI.2021.2

Author Details

Yoshiki Nakagawa

Department of Computer Science and Engineering, Waseda University, Tokyo, Japan

Satsuya Ohata

Digital Garage, Inc., Tokyo, Japan

Kana Shimizu

Department of Computer Science and Engineering, Waseda University, Tokyo, Japan
National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

References

Toshinori Araki, Jun Furukawa, Yehuda Lindell, Ariel Nof, and Kazuma Ohara. High-throughput semi-honest secure three-party computation with an honest majority. In Proc. of CCS 2016, pages 805-817, 2016. URL: https://doi.org/10.1145/2976749.2978331.
Gilad Asharov, Shai Halevi, Yehuda Lindell, and Tal Rabin. Privacy-preserving search of similar patients in genomic data. PoPETs, 2018(4):104-124, 2018. URL: https://doi.org/10.1515/popets-2018-0034.
Md Momin Al Aziz, Md. Nazmus Sadat, Dima Alhadidi, Shuang Wang, Xiaoqian Jiang, Cheryl L. Brown, and Noman Mohammed. Privacy-preserving techniques of genomic data - a survey. Briefings Bioinform., 20(3):887-895, 2019.
Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. Countering GATTACA: efficient and secure testing of fully-sequenced human genomes. In Proc. of CCS 2011, pages 691-702, 2011. URL: https://doi.org/10.1145/2046707.2046785.
Donald Beaver. Efficient multiparty protocols using circuit randomization. In Proc. of CRYPTO 1991, pages 420-432, 1991. URL: https://doi.org/10.1007/3-540-46766-1_34.
Yangyi Chen, Bo Peng, XiaoFeng Wang, and Haixu Tang. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In Proc. of NDSS 2012, 2012. URL: https://www.ndss-symposium.org/ndss2012/large-scale-privacy-preserving-mapping-human-genomic-sequences-hybrid-clouds.
Ke Cheng, Yantian Hou, and Liangmin Wang. Secure similar sequence query on outsourced genomic data. In Proc. of AsiaCCS 2018, pages 237-251, 2018. URL: https://doi.org/10.1145/3196494.3196535.
Jung Hee Cheon, Miran Kim, and Kristin E. Lauter. Homomorphic computation of edit distance. In Proc. of FC 2015, pages 194-212, 2015. URL: https://doi.org/10.1007/978-3-662-48051-9_15.
Richard Durbin. Efficient haplotype matching and storage using the positional burrows-wheeler transform (pbwt). Bioinformatics, 30(9):1266-1272, 2014.
Yaniv Erlich and Arvind Narayanan. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics, 15(6):409-421, 2014.
Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Proc. of FOCS 2000, pages 390-398, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
Johannes Fischer, Veli Mäkinen, and Gonzalo Navarro. An(other) entropy-bounded compressed suffix tree. In Proc. of CPM 2008, pages 152-165, 2008. URL: https://doi.org/10.1007/978-3-540-69068-9_16.
Marc Fiume, Miroslav Cupak, Stephen Keenan, Jordi Rambla, Sabela de la Torre, Stephanie OM Dyke, Anthony J Brookes, Knox Carey, David Lloyd, Peter Goodhand, et al. Federated discovery and sharing of genomic data using beacons. Nature biotechnology, 37(3):220-224, 2019.
Oded Goldreich. The Foundations of Cryptography - Volume 2, Basic Applications. Cambridge University Press, 2004. URL: https://doi.org/10.1017/CBO9780511721656.
Yan Huang, David Evans, Jonathan Katz, and Lior Malka. Faster secure two-party computation using garbled circuits. In Proc. of USENIX 2011, 2011. URL: http://static.usenix.org/events/sec11/tech/full_papers/Huang.pdf.
Y. Ishimaki, H. Imabayashi, K. Shimizu, and H. Yamana. Privacy-preserving string search for genome sequences with fhe bootstrapping optimization. In Proc. of IEEE Big Data 2016, pages 3989-3991, 2016.
Somesh Jha, Louis Kruger, and Vitaly Shmatikov. Towards practical privacy for genomic computation. In Proc. of IEEE S&P 2000, pages 216-230, 2008. URL: https://doi.org/10.1109/SP.2008.34.
Md Safiur Rahman Mahdi, Md Momin Al Aziz, Noman Mohammed, and Xiaoqian Jiang. Privacy-preserving string search on encrypted genomic data using a generalized suffix tree. Informatics in Medicine Unlocked, 23:100525, 2021.
Payman Mohassel, Ostap Orobets, and Ben Riva. Efficient server-aided 2pc for mobile phones. PoPETs, 2016(2):82-99, 2016. URL: https://doi.org/10.1515/popets-2016-0006.
Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacy-preserving machine learning. In Proc. of IEEE S&P 2017, pages 19-38, 2017. URL: https://doi.org/10.1109/SP.2017.12.
Muhammad Naveed, Erman Ayday, Ellen W Clayton, Jacques Fellay, Carl A Gunter, Jean-Pierre Hubaux, Bradley A Malin, and XiaoFeng Wang. Privacy in the genomic era. ACM Computing Surveys (CSUR), 48(1):1-44, 2015.
Koji Nuida, Satsuya Ohata, Shigeo Mitsunari, and Nuttapong Attrapadung. Arbitrary univariate function evaluation and re-encryption protocols over lifted-elgamal type ciphertexts. IACR Cryptology ePrint Archive, 2019:1233, 2019. URL: https://eprint.iacr.org/2019/1233.
Satsuya Ohata and Koji Nuida. Communication-efficient (client-aided) secure two-party protocols and its application. In proc. of FC 2020, pages 369-385, 2020.
Anthony A Philippakis, Danielle R Azzariti, Sergi Beltran, Anthony J Brookes, Catherine A Brownstein, Michael Brudno, Han G Brunner, Orion J Buske, Knox Carey, Cassie Doll, et al. The matchmaker exchange: a platform for rare disease gene discovery. Human mutation, 36(10):915-921, 2015.
Victoria Popic and Serafim Batzoglou. A hybrid cloud read aligner based on minhash and kmer voting that preserves privacy. Nature communications, 8(1):1-7, 2017.
Thomas Schneider and Oleksandr Tkachenko. EPISODE: efficient privacy-preserving similar sequence queries on outsourced genomic databases. In Proc. of AsiaCCS 2019, pages 315-327, 2019. URL: https://doi.org/10.1145/3321705.3329800.
Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612-613, 1979.
Kana Shimizu, Koji Nuida, and Gunnar Rätsch. Efficient privacy-preserving string search and an application in genomics. Bioinformatics, 32(11):1652-1661, 2016. URL: https://doi.org/10.1093/bioinformatics/btw050.
Katerina Sotiraki, Esha Ghosh, and Hao Chen. Privately computing set-maximal matches in genomic data. BMC Medical Genomics, 13(7):1-8, 2020.
H. Sudo, M. Jimbo, K. Nuida, and K. Shimizu. Secure wavelet matrix: Alphabet-friendly privacy-preserving string search for bioinformatics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(5):1675-1684, 2019.
Xiao Shaun Wang, Yan Huang, Yongan Zhao, Haixu Tang, XiaoFeng Wang, and Diyue Bu. Efficient genome-wide, privacy-preserving similar patient query based on private edit distance. In Proc. of CCS 2015, pages 492-503, 2015. URL: https://doi.org/10.1145/2810103.2813725.
Masaya Yasuda, Takeshi Shimoyama, Jun Kogure, Kazuhiro Yokoyama, and Takeshi Koshiba. Secure pattern matching using somewhat homomorphic encryption. In Ari Juels and Bryan Parno, editors, Proc. of CCSW'13, pages 65-76, 2013. URL: https://doi.org/10.1145/2517488.2517497.
R. Zhu and Y. Huang. Efficient and precise secure generalized edit distance and beyond. IEEE Transactions on Dependable and Secure Computing, pages 1-1, 2020.

Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence

Authors Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence

Authors Yoshiki Nakagawa, Satsuya Ohata, Kana Shimizu

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message