String Matching: Communication, Circuits, and Learning

Authors Alexander Golovnev, Mika Göös, Daniel Reichman, Igor Shinkar



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2019.56.pdf
  • Filesize: 0.56 MB
  • 20 pages

Document Identifiers

Author Details

Alexander Golovnev
  • Harvard University, Cambridge, MA, USA
Mika Göös
  • Institute for Advanced Study, Princeton, NJ, USA
Daniel Reichman
  • Department of Computer Science, Princeton University, NJ, USA
Igor Shinkar
  • School of Computing Science, Simon Fraser University, Burnaby, BC, Canada

Acknowledgements

We thank Paweł Gawrychowski for his useful feedback and Gy. Turán for sharing [Groeger and Tur{á}n, 1993] with us. We are also very grateful to anonymous reviewers for their insightful comments.

Cite As Get BibTex

Alexander Golovnev, Mika Göös, Daniel Reichman, and Igor Shinkar. String Matching: Communication, Circuits, and Learning. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 56:1-56:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.56

Abstract

String matching is the problem of deciding whether a given n-bit string contains a given k-bit pattern. We study the complexity of this problem in three settings. 
- Communication complexity. For small k, we provide near-optimal upper and lower bounds on the communication complexity of string matching. For large k, our bounds leave open an exponential gap; we exhibit some evidence for the existence of a better protocol. 
- Circuit complexity. We present several upper and lower bounds on the size of circuits with threshold and DeMorgan gates solving the string matching problem. Similarly to the above, our bounds are near-optimal for small k. 
- Learning. We consider the problem of learning a hidden pattern of length at most k relative to the classifier that assigns 1 to every string that contains the pattern. We prove optimal bounds on the VC dimension and sample complexity of this problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Communication complexity
  • Theory of computation → Circuit complexity
  • Theory of computation → Boolean function learning
Keywords
  • string matching
  • communication complexity
  • circuit complexity
  • PAC learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, 75(2):87-106, 1987. Google Scholar
  2. Martin Anthony and Peter L. Bartlett. Neural network learning: Theoretical foundations. Cambridge University Press, 2009. Google Scholar
  3. Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. The sketching complexity of pattern matching. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 261-272. Springer, 2004. Google Scholar
  4. Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702-732, 2004. Google Scholar
  5. Omri Ben-Eliezer, Simon Korman, and Daniel Reichman. Deleting and testing forbidden patterns in multi-dimensional arrays. In International Proceedings in Informatics, volume 80. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  6. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36(4):929-965, 1989. Google Scholar
  7. Robert S. Boyer and J. Strother Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762-772, 1977. Google Scholar
  8. Mark Braverman and Omri Weinstein. A Discrepancy Lower Bound for Information Complexity. Algorithmica, 76(3):846-864, 2016. URL: https://doi.org/10.1007/s00453-015-0093-8.
  9. Arkadev Chattopadhyay, Nikhil Mande, and Suhail Sherif. The Log-Approximate-Rank Conjecture is False. In Proceedings of the 51st Symposium on Theory of Computing, 2019. To appear. Google Scholar
  10. Amit Daniely and Shai Shalev-Shwartz. Complexity theoretic limitations on learning DNF’s. In Conference on Learning Theory, pages 815-830, 2016. Google Scholar
  11. Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247-261, 1989. Google Scholar
  12. Jürgen Forster, Matthias Krause, Satyanarayana V. Lokam, Rustam Mubarakzjanov, Niels Schmitt, and Hans Ulrich Simon. Relations between communication complexity, linear arrangements, and computational complexity. In International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 171-182. Springer, 2001. Google Scholar
  13. Yoav Freund, Michael Kearns, Dana Ron, Ronitt Rubinfeld, Robert E Schapire, and Linda Sellie. Efficient learning of typical finite automata from random walks. Information and Computation, 138(1):23-48, 1997. Google Scholar
  14. Zvi Galil. Optimal parallel algorithms for string matching. Information and Control, 67(1-3):144-157, 1985. Google Scholar
  15. Zvi Galil and Joel Seiferas. Time-space-optimal string matching. Journal of Computer and System Sciences, 26(3):280-294, 1983. Google Scholar
  16. Hans Dietmar Groeger and György Turán. A linear lower bound for the size of threshold circuits. Bulletin-European Association For Theoretical Computer Science, 50:220-220, 1993. Google Scholar
  17. András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth. Journal of Computer and System Sciences, 46(2):129-154, 1993. Google Scholar
  18. Steve Hanneke. The optimal sample complexity of PAC learning. The Journal of Machine Learning Research, 17(1):1319-1333, 2016. Google Scholar
  19. Johan Håstad. Computational Limitations of Small-depth Circuits. MIT Press, 1987. Google Scholar
  20. Johan Håstad, Stasys Jukna, and Pavel Pudlák. Top-down lower bounds for depth-three circuits. Computational Complexity, 5(2):99-112, 1995. Google Scholar
  21. Stasys Jukna. On graph complexity. Combinatorics, Probability and Computing, 15(6):855-876, 2006. Google Scholar
  22. Stasys Jukna. Boolean function complexity: advances and frontiers, volume 27. Springer Science & Business Media, 2012. Google Scholar
  23. Bala Kalyanasundaram and Georg Schintger. The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics, 5(4):545-557, 1992. Google Scholar
  24. Daniel M. Kane and Ryan Williams. Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 633-643. ACM, 2016. Google Scholar
  25. Donald E. Knuth, James H. Morris, Jr, and Vaughan R. Pratt. Fast pattern matching in strings. SIAM journal on computing, 6(2):323-350, 1977. Google Scholar
  26. Eyal Kushilevitz and Noam Nisan. Communication Complexity. Cambridge University Press, 1997. Google Scholar
  27. Eyal Kushilevitz and Dan Roth. On learning visual concepts and DNF formulae. Machine Learning, 24(1):65-85, 1996. Google Scholar
  28. Troy Lee and Adi Shraibman. Lower Bounds in Communication Complexity, volume 3. Now Publishers, 2009. URL: https://doi.org/10.1561/0400000040.
  29. Robert A. Legenstein and Wolfgang Maass. Foundations for a circuit complexity theory of sensory processing. Advances in neural information processing systems, pages 259-265, 2001. Google Scholar
  30. Robert A. Legenstein and Wolfgang Maass. Neural circuits for pattern recognition with small total wire length. Theoretical Computer Science, 287(1):239-249, 2002. Google Scholar
  31. R. C. Lyndon and M. P. Schützenberger. The equation a^M = b^Nc^P in a free group. Michigan Mathematical Journal, 9:289-298, 1962. Google Scholar
  32. James Martens, Arkadev Chattopadhya, Toni Pitassi, and Richard Zemel. On the representational efficiency of restricted Boltzmann machines. In Advances in Neural Information Processing Systems, pages 2877-2885, 2013. Google Scholar
  33. Saburo Muroga. Threshold logic and its application. Wily-Interscience, 1971. Google Scholar
  34. Noam Nisan. The communication complexity of threshold gates. Combinatorics, Paul Erdos is Eighty, 1:301-315, 1993. Google Scholar
  35. Ian Parberry. Circuit complexity and neural networks. MIT press, 1994. Google Scholar
  36. Ian Parberry and Georg Schnitger. Parallel computation with threshold functions. Journal of Computer and System Sciences, 36(3):278-302, 1988. Google Scholar
  37. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In Foundations of Computer Science, 2009. 50th Annual IEEE Symposium on, pages 315-323. IEEE, 2009. Google Scholar
  38. Alexander A. Razborov. On small depth threshold circuits. In Scandinavian Workshop on Algorithm Theory, pages 42-52. Springer, 1992. Google Scholar
  39. Alexander A. Razborov. On the distributional complexity of disjointness. Theoretical Computer Science, 106(2):385-390, 1992. Google Scholar
  40. Ronald L. Rivest. On the worst-case behavior of string-searching algorithms. SIAM Journal on Computing, 6(4):669-674, 1977. Google Scholar
  41. Dana Ron and Ronitt Rubinfeld. Exactly learning automata of small cover time. Machine Learning, 27(1):69-96, 1997. Google Scholar
  42. Christian Rosenke. The exact complexity of projective image matching. Journal of Computer and System Sciences, 82(8):1360-1387, 2016. Google Scholar
  43. Vwani P. Roychowdhury, Alon Orlitsky, and Kai-Yeung Siu. Lower bounds on threshold and related circuits via communication complexity. IEEE Transactions on Information Theory, 40(2):467-474, 1994. Google Scholar
  44. Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014. Google Scholar
  45. Haim Shvaytser. Learnable and nonlearnable visual concepts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5):459-466, 1990. Google Scholar
  46. Kai-Yeung Siu and Jehoshua Bruck. On the power of threshold circuits with small weights. SIAM Journal on Discrete Mathematics, 4(3):423-435, 1991. Google Scholar
  47. Kai-Yeung Siu, Jehoshua Bruck, Thomas Kailath, and Thomas Hofmeister. Depth efficient neural networks for division and related problems. IEEE Transactions on information theory, 39(3):946-956, 1993. Google Scholar
  48. Kei Uchizawa, Daiki Yashima, and Xiao Zhou. Threshold Circuits for Global Patterns in 2-Dimensional Maps. In International Workshop on Algorithms and Computation, pages 306-316. Springer, 2015. Google Scholar
  49. Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984. Google Scholar
  50. Thomas Watson. Communication Complexity of Statistical Distance. ACM Transactions on Computation Theory, 10(1):2:1-2:11, 2018. URL: https://doi.org/10.1145/3170708.
  51. Mihalis Yannakakis. Expressing combinatorial optimization problems by Linear Programs. Journal of Computer and System Sciences, 43(3):441-466, 1991. URL: https://doi.org/10.1016/0022-0000(91)90024-Y.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail