Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

Authors Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2016.46.pdf
  • Filesize: 0.54 MB
  • 12 pages

Document Identifiers

Author Details

Tomasz Kociumaka
Solon P. Pissis
Jakub Radoszewski

Cite As Get BibTex

Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Pattern Matching and Consensus Problems on Weighted Sequences and Profiles. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 46:1-46:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ISAAC.2016.46

Abstract

We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.

Subject Classification

Keywords
  • weighted sequence
  • position weight matrix
  • profile matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Eran Chencinski, Costas S. Iliopoulos, Tsvi Kopelowitz, and Hui Zhang. Property matching and weighted matching. Theor. Comput. Sci., 395(2-3):298-310, April 2008. URL: http://dx.doi.org/10.1016/j.tcs.2008.01.006.
  2. Carl Barton, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Efficient index for weighted sequences. In Roberto Grossi and Moshe Lewenstein, editors, Combinatorial Pattern Matching, CPM 2016, volume 54 of LIPIcs, pages 4:1-4:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2016.4.
  3. Carl Barton and Solon P. Pissis. Linear-time computation of prefix table for weighted strings. In Florin Manea and Dirk Nowotka, editors, Combinatorics on Words, WORDS 2015, volume 9304 of LNCS, pages 73-84. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-23660-5_7.
  4. Sudip Biswas, Manish Patil, Sharma V. Thankachan, and Rahul Shah. Probabilistic threshold indexing for uncertain strings. In Evaggelia Pitoura, Sofian Maabout, Georgia Koutrika, Amélie Marian, Letizia Tanca, Ioana Manolescu, and Kostas Stefanidis, editors, 19th International Conference on Extending Database Technology, EDBT 2016, pages 401-412. OpenProceedings.org, 2016. URL: http://dx.doi.org/10.5441/002/edbt.2016.37.
  5. Manolis Christodoulakis, Costas S. Iliopoulos, Laurent Mouchard, and Kostas Tsichlas. Pattern matching on weighted sequences. In Algorithms and Computational Methods for Biochemical and Evolutionary Networks, CompBioNets 2004, KCL publications, 2004. Google Scholar
  6. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings. Cambridge University Press, New York, NY, USA, 2007. Google Scholar
  7. Michael Etscheid, Stefan Kratsch, Matthias Mnich, and Heiko Röglin. Polynomial kernels for weighted problems. In Giuseppe F. Italiano, Giovanni Pighizzini, and Donald Sannella, editors, Mathematical Foundations of Computer Science, MFCS 2015, Part II, volume 9235 of LNCS, pages 287-298. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48054-0_24.
  8. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: http://dx.doi.org/10.1145/828.1884.
  9. Anka Gajentaan and Mark H. Overmars. On a class of O(n²) problems in computational geometry. Comput. Geom., 5:165-185, 1995. URL: http://dx.doi.org/10.1016/0925-7721(95)00022-2.
  10. Eitan M. Gurari. Introduction to the theory of computation. Computer Science Press, 1989. Google Scholar
  11. Ellis Horowitz and Sartaj Sahni. Computing partitions with applications to the knapsack problem. J. ACM, 21(2):277-292, 1974. URL: http://dx.doi.org/10.1145/321812.321823.
  12. Costas S. Iliopoulos, Christos Makris, Yannis Panagis, Katerina Perdikuri, Evangelos Theodoridis, and Athanasios K. Tsakalidis. The weighted suffix tree: An efficient data structure for handling molecular weighted sequences and its applications. Fundam. Inform., 71(2-3):259-277, 2006. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi71-2-3-07.
  13. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
  14. Hans Kellerer, Ulrich Pferschy, and David Pisinger. Knapsack problems. Springer, 2004. Google Scholar
  15. Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the Exponential Time Hypothesis. Bulletin of the EATCS, 105:41-72, 2011. URL: http://bulletin.eatcs.org/index.php/beatcs/article/view/92.
  16. Cinzia Pizzi and Esko Ukkonen. Fast profile matching algorithms - A survey. Theor. Comput. Sci., 395(2-3):137-157, 2008. URL: http://dx.doi.org/10.1016/j.tcs.2008.01.015.
  17. Sanguthevar Rajasekaran, X. Jin, and John L. Spouge. The efficient computation of position-specific match scores with the fast Fourier transform. J. Comp. Biol., 9(1):23-33, 2002. URL: http://dx.doi.org/10.1089/10665270252833172.
  18. Milan Ružić. Constructing efficient dictionaries in close to sorting time. In Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz, editors, Automata, Languages and Programming, ICALP 2008, Part I, volume 5125 of LNCS, pages 84-95. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-70575-8_8.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail