Linear-time algorithms for the subpath kernel

Authors Kilho Shin, Taichi Ishikawa

Thumbnail PDF


  • Filesize: 0.57 MB
  • 13 pages

Document Identifiers

Author Details

Kilho Shin
  • Graduate School of Applied Informatics, University of Hyogo, Minatojima-Minamimachi, Chuo, Kobe, Japan
Taichi Ishikawa
  • Graduate School of Applied Informatics, University of Hyogo, Minatojima-Minamimachi, Chuo, Kobe, Japan

Cite AsGet BibTex

Kilho Shin and Taichi Ishikawa. Linear-time algorithms for the subpath kernel. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 22:1-22:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


The subpath kernel is a useful positive definite kernel, which takes arbitrary rooted trees as input, no matter whether they are ordered or unordered, We first show that the subpath kernel can exhibit excellent classification performance in combination with SVM through an intensive experiment. Secondly, we develop a theory of irreducible trees, and then, using it as a rigid mathematical basis, reconstruct a bottom-up linear-time algorithm for the subtree kernel, which is a correction of an algorithm well-known in the literature. Thirdly, we show a novel top-down algorithm, with which we can realize a linear-time parallel-computing algorithm to compute the subpath kernel.

Subject Classification

ACM Subject Classification
  • Theory of computation → Kernel methods
  • tree
  • kernel
  • suffix tree


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Christensen Berg, C. and R. J. P. R., Ressel. Harmonic analysis on semigroups. theory of positive definite and related functions. Springer, 1984. Google Scholar
  2. C. C. Chang and C. J. Lin. Libsvm: a library for support vector machines, 2001. URL:
  3. M. Collins and N. Duffy. Convolution kernels for natural language. Neural Information Processing Systems, 2001. Google Scholar
  4. J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Theory, 7:1-30, 2006. Google Scholar
  5. K. Hashimoto, S. Goto, S. Kawano, K. F. Aoki-Kinoshita, and N. Ueda. KEGG as a glycome informatics resource. Glycobiology, 16:63R-70R, 2006. Google Scholar
  6. D. Haussler. Convolution kernels on discrete structures. UCSC-CRL 99-10, 1999. Google Scholar
  7. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. the 12th Annual Symposium on Combinatorial Pattern Matching. pp., 2001. Google Scholar
  8. H. Kashima and T. Koyanagi. Kernels for semi-structured data. in: the 9th international conference on machine learning. ICML, 2002. Google Scholar
  9. D. Kimura and H. Kashima. Fast computation of subpath kernel for trees. ICML, 2012. Google Scholar
  10. T. Kuboyama, K. Hirata, H. Kashima, K.F. Aoki-Kinoshita, and H. Yasuda. A spectrum tree kernel. JSAI, 2007. Google Scholar
  11. C. S. Leslie, E. Eskin, and W. Stafford Noble. The spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing, 2002. Google Scholar
  12. Alessandro Moschitti. Example data for TREE KERNELS IN SVM-LIGHT. URL:
  13. S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski. Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, 9(S-3), 2008. Google Scholar
  14. K. Shin and T. Kuboyama. A generalization of Haussler’s convolution kernel - mapping kernel. ICML, 2008. Google Scholar
  15. K. Shin and T. Kuboyama. A comprehensive study of tree kernels. in: Jsai-isai post-workshop proceedings. Lecture Notes in Articial Intelligence, 2014. Google Scholar
  16. K. C. Taï. The tree-to-tree correction problem. journal of the ACM, 1979. Google Scholar
  17. M. J. Zaki and C. C. Aggarwal. XRules: An effective algorithm for structural classification of XML data. Machine Learning, 62:137-170, 2006. Google Scholar
  18. K. Zhang. Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition, 1995. Google Scholar