Barcode Selection and Layout Optimization in Spatial Transcriptomics

Authors Frederik L. Jatzkowski , Antonia Schmidt , Robert Mank, Steffen Schüler , Matthias Müller-Hannemann



PDF
Thumbnail PDF

File

LIPIcs.SEA.2024.17.pdf
  • Filesize: 0.97 MB
  • 19 pages

Document Identifiers

Author Details

Frederik L. Jatzkowski
  • Martin Luther University Halle-Wittenberg, Germany
Antonia Schmidt
  • Martin Luther University Halle-Wittenberg, Germany
Robert Mank
  • Martin Luther University Halle-Wittenberg, Germany
Steffen Schüler
  • Martin Luther University Halle-Wittenberg, Germany
Matthias Müller-Hannemann
  • Martin Luther University Halle-Wittenberg, Germany

Cite AsGet BibTex

Frederik L. Jatzkowski, Antonia Schmidt, Robert Mank, Steffen Schüler, and Matthias Müller-Hannemann. Barcode Selection and Layout Optimization in Spatial Transcriptomics. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 17:1-17:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.17

Abstract

An important special case of the quadratic assignment problem arises in the synthesis of DNA microarrays for high-resolution spatial transcriptomics. The task is to select a suitable subset from a set of barcodes, i. e. short DNA strings that serve as unique identifiers, and to assign the selected barcodes to positions on a two-dimensional array in such a way that a position-dependent cost function is minimized. A typical microarray with dimensions of 768×1024 requires 786,432 many barcodes to be placed, leading to very challenging large-scale combinatorial optimization problems. The general quadratic assignment problem is well-known for its hardness, both in theory and in practice. It turns out that this also holds for the special case of the barcode layout problem. We show that the problem is even hard to approximate: It is MaxSNP-hard. An ILP formulation theoretically allows the computation of optimal results, but it is only applicable for tiny instances. Therefore, we have developed layout constructing and improving heuristics with the aim of computing near-optimal solutions for instances of realistic size. These include a sorting-based algorithm, a greedy algorithm, 2-OPT-based local search and a genetic algorithm. To assess the quality of the results, we compare the generated solutions with the expected cost of a random layout and with lower bounds. A combination of the greedy algorithm and 2-OPT local search produces the most promising results in terms of both quality and runtime. Solutions to large-scale instances with arrays of dimension 768×1024 show a 37% reduction in cost over a random solution and can be computed in about 3 minutes. Since the universe of suitable barcodes is much larger than the number of barcodes needed, this can be exploited. Experiments with different surpluses of barcodes show that a significant improvement in layout quality can be achieved at the cost of a reasonable increase in runtime. Another interesting finding is that the restriction of the barcode design space by biochemical constraints is actually beneficial for the overall layout cost.

Subject Classification

ACM Subject Classification
  • Applied computing → Operations research
  • Theory of computation → Mathematical optimization
Keywords
  • Spatial Transcriptomics
  • Array Layout
  • Optimization
  • Computational Complexity
  • GPU Computing
  • Integer Linear Programming
  • Metaheuristics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. James L Banal, Tyson R. Shepherd, Joseph Berleant, Hellen Huang, Miguel Reyes, Cheri M. Ackermann, Paul C. Blainey, and Mark Bathe. Random access DNA memory using boolean search in an archival file storage system. Nature Materials, 21:1272-1280, 2021. URL: https://doi.org/10.1038/s41563-021-01021-3.
  2. Hyo-eun C Bhang, David A Ruddy, Viveksagar Krishnamurthy Radhakrishna, Justina X Caushi, Rui Zhao, Matthew M Hims, Angad P Singh, Iris Kao, Daniel Rakiec, Pamela Shaw, Marissa Balak, Alina Raza, Elizabeth Ackley, Nicholas Keen, Michael R Schlabach, Michael Palmer, Rebecca J Leary, Derek Y Chiang, William R Sellers, Franziska Michor, Vesselina G Cooke, Joshua M Korn, and Frank Stegmeier. Studying clonal dynamics in response to cancer therapy using high-complexity barcoding. Nature Medicine, 21:440-448, 2015. URL: https://doi.org/10.1038/nm.3841.
  3. Tilo Buschmann and Leonid Bystrykh. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC bioinformatics, 14:272, September 2013. URL: https://doi.org/10.1186/1471-2105-14-272.
  4. Sérgio A. de Carvalho Jr. and Sven Rahmann. Microarray layout as quadratic assignment problem. In German Conference on Bioinformatics, pages 11-20. Gesellschaft für Informatik e.V., Bonn, 2006. Google Scholar
  5. Sérgio A. de Carvalho Jr. and Sven Rahmann. Better genechip microarray layouts by combining probe placement and embedding. Journal of bioinformatics and computational biology, 6(3):623-641, 2008. URL: https://doi.org/10.1142/s0219720008003576.
  6. Paul Igor Costea, Joakim Lundeberg, and Pelin Akan. TagGD: Fast and accurate software for DNA tag generation and demultiplexing. PLoS ONE, 8(3):e57521, 2013. URL: https://doi.org/10.1371/journal.pone.0057521.
  7. Brant C. Faircloth and Travis C. Glenn. Not all sequence tags are created equal: Designing and validating sequence identification tags robust to indels. PLOS ONE, 7(8):e42543, 2012. URL: https://doi.org/10.1371/journal.pone.0042543.
  8. Gerd Finke, Rainer E. Burkard, and Franz Rendl. Quadratic assignment problems. In Silvano Martello, Gilbert Laporte, Michel Minoux, and Celso Ribeiro, editors, Surveys in Combinatorial Optimization, volume 132 of North-Holland Mathematics Studies, pages 61-82. North-Holland, 1987. URL: https://doi.org/10.1016/S0304-0208(08)73232-8.
  9. Paul C. Gilmore. Optimal and suboptimal algorithms for the quadratic assignment problem. Journal of the Society for Industrial and Applied Mathematics, 10(2):305-313, 1962. URL: https://doi.org/10.1137/0110022.
  10. John Grefenstette, Rajeev Gopal, Brian Rosmaita, and Dirk Van Gucht. Genetic algorithms for the traveling salesman problem. In Proceedings of the 1st International Conference on Genetic Algorithms, January 1985. Google Scholar
  11. Sridhar Hannenhalli, Earl Hubell, Robert Lipshutz, and Pavel A. Pevzner. Combinatorial algorithms for design of DNA arrays. Adv Biochem Eng Biotechnol., 77:1-19, 2002. URL: https://doi.org/10.1007/3-540-45713-5_1.
  12. Ahmad Hassanat, Khalid Almohammadi, Esra’a Alkafaween, Eman Abunawas, Awni Hammouri, and V. B. Surya Prasath. Choosing mutation and crossover ratios for genetic algorithms— a review with a new dynamic approach. Information, 10(12), 2019. URL: https://doi.org/10.3390/info10120390.
  13. John Holland. Adaptation in natural and artificial systems. The MIT Press, 1975. Google Scholar
  14. John H. Holland. Genetic algorithms and adaptation. In Oliver G. Selfridge, Edwina L. Rissland, and Michael A. Arbib, editors, Adaptive Control of Ill-Defined Systems, pages 317-333. Springer US, Boston, MA, 1984. URL: https://doi.org/10.1007/978-1-4684-8941-5_21.
  15. Andrew B. Kahng, Ion I. Măndoiu, Pavel A. Pevzner, Sherief Reda, and Alexander Z. Zelikovsky. Scalable heuristics for design of DNA probe arrays. Journal of computational biology : a journal of computational molecular cell biology, 11(2-3):429-447, 2004. Google Scholar
  16. Justus M. Kebschull and Anthony M. Zador. Cellular barcoding: lineage tracing, screening and beyond. Nature Methods, 15:871-879, 2018. URL: https://doi.org/10.1038/s41592-018-0185-x.
  17. Allon M. Klein, Linas Mazutis, Ilke Akartuna, Naren Tallapragada, Adrian Veres, Victor Li, Leonid Peshkin, David A. Weitz, and Marc W. Kirschner. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161(5):1187-1201, 2015. URL: https://doi.org/10.1016/j.cell.2015.04.044.
  18. Tjalling C. Koopmans and Martin Beckmann. Assignment problems and the location of economic activities. Econometrica, 25(1):53-76, 1957. Google Scholar
  19. Manoj Kumar, Aryabartta Sahu, and Pinaki Mitra. A comparison of different metaheuristics for the quadratic assignment problem in accelerated systems. Applied Soft Computing, 100:106927, 2021. URL: https://doi.org/10.1016/j.asoc.2020.106927.
  20. Vamsi Kundeti and Sanguthevar Rajasekaran. On the hardness of the border length minimization problem on a rectangular array. International Journal of Foundations of Computer Science, 21(6):1089-1100, 2010. URL: https://doi.org/10.1142/S0129054110007751.
  21. Vamsi Kundeti, Sanguthevar Rajasekaran, and Hieu Dinh. Border length minimization problem on a square array. Journal of Computational Biology, 21(6):446-455, 2014. URL: https://doi.org/10.1089/cmb.2013.0127.
  22. Eugene L. Lawler. The quadratic assignment problem. Management Science, 9:586-599, 1963. URL: https://doi.org/10.1287/mnsc.9.4.586.
  23. Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10(8):707-710, 1966. Google Scholar
  24. Cindy Y. Li, Prudence W. H. Wong, Qin Xin, and Fencol C. C. Yung. Approximating border length for DNA microarray synthesis. In Manindra Agrawal, Dingzhu Du, Zhenhua Duan, and Angsheng Li, editors, Theory and Applications of Models of Computation, pages 410-422. Springer Berlin Heidelberg, 2008. Google Scholar
  25. Jory Lietard, Adrien Leger, Yaniv Erlich, Norah Sadowski, Winston Timp, and Mark Somoza. Chemical and photochemical error rates in light-directed synthesis of complex DNA libraries. Nucleic Acids Research, 49, June 2021. URL: https://doi.org/10.1093/nar/gkab505.
  26. Yang Liu, Mingyu Yang, Yanxiang Deng, Graham Su, Archibald Enninful, Cindy C. Guo, Toma Tebaldi, Di Zhang, Dongjoo Kim, Zhiliang Bai, Eileen Norris, Alisia Pan, Jiatong Li, Yang Xiao, Stephanie Halene, and Rong Fan. High-spatial-resolution multi-omics sequencing via deterministic barcoding in tissue. Cell, 183(6):1665-1681.e18, 2020. URL: https://doi.org/10.1016/j.cell.2020.10.026.
  27. Eliane Maria Loiola, Nair Maria Maia de Abreu, Paulo Oswaldo Boaventura-Netto, Peter Hahn, and Tania Querido. A survey for the quadratic assignment problem. European Journal of Operational Research, 176(2):657-690, 2007. URL: https://doi.org/10.1016/j.ejor.2005.09.032.
  28. Chilukuri K. Mohan. Selective crossover: towards fitter offspring. In Proceedings of the 1998 ACM symposium on Applied Computing, SAC '98, pages 374-378, New York, NY, USA, February 1998. Association for Computing Machinery. URL: https://doi.org/10.1145/330560.330842.
  29. Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. Journal of Computer and System Sciences, 43(3):425-440, December 1991. URL: https://doi.org/10.1016/0022-0000(91)90023-X.
  30. Christos H. Papadimitriou and Mihalis Yannakakis. The traveling salesman problem with distances one and two. Mathematics of Operations Research, 18(1):1-11, 1993. Google Scholar
  31. Panos M. Pardalos, Franz Rendl, and Henry Wolkowicz. The quadratic assignment problem: A survey and recent developments. In Quadratic Assignment and Related Problems, volume 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 1-42. AMS, Providence, RI, 1994. Google Scholar
  32. Alexandru Popa, Prudence W. H. Wong, and Fencol C. C. Yung. Hardness and approximation of the asynchronous border minimization problem. In Manindra Agrawal, S. Barry Cooper, and Angsheng Li, editors, Theory and Applications of Models of Computation, pages 164-176. Springer Berlin Heidelberg, 2012. Google Scholar
  33. William H Press. Fast trimer statistics facilitate accurate decoding of large random DNA barcode sets even at large sequencing error rates. PNAS Nexus, November 2022. URL: https://doi.org/10.1093/pnasnexus/pgac252.
  34. Sartaj Sahni and Teofilo Gonzalez. P-complete approximation problems. J. ACM, 23(3):555-565, 1976. URL: https://doi.org/10.1145/321958.321975.
  35. Alejandro A. Schäffer and Mihalis Yannakakis. Simple local search problems that are hard to solve. SIAM Journal on Computing, 20(1):56-87, 1991. URL: https://doi.org/10.1137/0220004.
  36. Allyson Silva, Leandro C. Coelho, and Maryam Darvish. Quadratic assignment problem variants: A survey and an effective parallel memetic iterated tabu search. European Journal of Operational Research, 292(3):1066-1084, 2021. URL: https://doi.org/10.1016/j.ejor.2020.11.035.
  37. Luca Trevisan. When Hamming meets Euclid: The approximability of geometric TSP and Steiner tree. SIAM Journal on Computing, 30(2):475-485, 2000. URL: https://doi.org/10.1137/S0097539799352735.
  38. Céline Trébeau, Jacques Boutet de Monvel, Fabienne Wong Jun Tai, Christine Petit, and Raphaël Etournay. DNABarcodeCompatibility: an R-package for optimizing DNA-barcode combinations in multiplex sequencing experiments. Bioinformatics, 35(15):2690-2691, December 2019. URL: https://doi.org/10.1093/bioinformatics/bty1030.
  39. Anantkumar Umbarkar and Pranali Sheth. Crossover operators in genetic algorithms: a review. ICTACT Journal on Soft Computing, 6, October 2015. URL: https://doi.org/10.21917/ijsc.2015.0150.
  40. Yuqing Wang, Xi Zhang, and Zheng Wang. Cellular barcoding: From developmental tracing to anti-tumor drug discovery. Cancer Letters, 567:216281, 2023. URL: https://doi.org/10.1016/j.canlet.2023.216281.
  41. Johannes Wirth, Nina Huber, Kelvin Yin, Sophie Brood, Simon Chang, Celia P. Martinez-Jiminez, and Matthias Meier. Spatial transcriptomics using multiplexed deterministic barcoding in tissue. Nature Communications, 14(1523), 2023. URL: https://doi.org/10.1038/s41467-023-37111-w.