Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

Kociumaka, Tomasz; Pissis, Solon P.; Radoszewski, Jakub

doi:10.4230/LIPIcs.ISAAC.2016.46

File

Author Details

Tomasz Kociumaka

Solon P. Pissis

Jakub Radoszewski

Cite AsGet BibTex

Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Pattern Matching and Consensus Problems on Weighted Sequences and Profiles. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 46:1-46:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ISAAC.2016.46

Abstract

We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Amihood Amir, Eran Chencinski, Costas S. Iliopoulos, Tsvi Kopelowitz, and Hui Zhang. Property matching and weighted matching. Theor. Comput. Sci., 395(2-3):298-310, April 2008. URL: http://dx.doi.org/10.1016/j.tcs.2008.01.006.
Carl Barton, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Efficient index for weighted sequences. In Roberto Grossi and Moshe Lewenstein, editors, Combinatorial Pattern Matching, CPM 2016, volume 54 of LIPIcs, pages 4:1-4:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2016.4.
Carl Barton and Solon P. Pissis. Linear-time computation of prefix table for weighted strings. In Florin Manea and Dirk Nowotka, editors, Combinatorics on Words, WORDS 2015, volume 9304 of LNCS, pages 73-84. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-23660-5_7.
Sudip Biswas, Manish Patil, Sharma V. Thankachan, and Rahul Shah. Probabilistic threshold indexing for uncertain strings. In Evaggelia Pitoura, Sofian Maabout, Georgia Koutrika, Amélie Marian, Letizia Tanca, Ioana Manolescu, and Kostas Stefanidis, editors, 19th International Conference on Extending Database Technology, EDBT 2016, pages 401-412. OpenProceedings.org, 2016. URL: http://dx.doi.org/10.5441/002/edbt.2016.37.
Manolis Christodoulakis, Costas S. Iliopoulos, Laurent Mouchard, and Kostas Tsichlas. Pattern matching on weighted sequences. In Algorithms and Computational Methods for Biochemical and Evolutionary Networks, CompBioNets 2004, KCL publications, 2004.
Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings. Cambridge University Press, New York, NY, USA, 2007.
Michael Etscheid, Stefan Kratsch, Matthias Mnich, and Heiko Röglin. Polynomial kernels for weighted problems. In Giuseppe F. Italiano, Giovanni Pighizzini, and Donald Sannella, editors, Mathematical Foundations of Computer Science, MFCS 2015, Part II, volume 9235 of LNCS, pages 287-298. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48054-0_24.
Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: http://dx.doi.org/10.1145/828.1884.
Anka Gajentaan and Mark H. Overmars. On a class of O(n²) problems in computational geometry. Comput. Geom., 5:165-185, 1995. URL: http://dx.doi.org/10.1016/0925-7721(95)00022-2.
Eitan M. Gurari. Introduction to the theory of computation. Computer Science Press, 1989.
Ellis Horowitz and Sartaj Sahni. Computing partitions with applications to the knapsack problem. J. ACM, 21(2):277-292, 1974. URL: http://dx.doi.org/10.1145/321812.321823.
Costas S. Iliopoulos, Christos Makris, Yannis Panagis, Katerina Perdikuri, Evangelos Theodoridis, and Athanasios K. Tsakalidis. The weighted suffix tree: An efficient data structure for handling molecular weighted sequences and its applications. Fundam. Inform., 71(2-3):259-277, 2006. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi71-2-3-07.
Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
Hans Kellerer, Ulrich Pferschy, and David Pisinger. Knapsack problems. Springer, 2004.
Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the Exponential Time Hypothesis. Bulletin of the EATCS, 105:41-72, 2011. URL: http://bulletin.eatcs.org/index.php/beatcs/article/view/92.
Cinzia Pizzi and Esko Ukkonen. Fast profile matching algorithms - A survey. Theor. Comput. Sci., 395(2-3):137-157, 2008. URL: http://dx.doi.org/10.1016/j.tcs.2008.01.015.
Sanguthevar Rajasekaran, X. Jin, and John L. Spouge. The efficient computation of position-specific match scores with the fast Fourier transform. J. Comp. Biol., 9(1):23-33, 2002. URL: http://dx.doi.org/10.1089/10665270252833172.
Milan Ružić. Constructing efficient dictionaries in close to sorting time. In Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz, editors, Automata, Languages and Programming, ICALP 2008, Part I, volume 5125 of LNCS, pages 84-95. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-70575-8_8.

Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

Authors Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References