Regular Language Distance and Entropy

Authors Austin J. Parker, Kelly B. Yancey, Matthew P. Yancey

Thumbnail PDF


  • Filesize: 0.63 MB
  • 14 pages

Document Identifiers

Author Details

Austin J. Parker
Kelly B. Yancey
Matthew P. Yancey

Cite AsGet BibTex

Austin J. Parker, Kelly B. Yancey, and Matthew P. Yancey. Regular Language Distance and Entropy. In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, pp. 3:1-3:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


This paper addresses the problem of determining the distance between two regular languages. It will show how to expand Jaccard distance, which works on finite sets, to potentially-infinite regular languages. The entropy of a regular language plays a large role in the extension. Much of the paper is spent investigating the entropy of a regular language. This includes addressing issues that have required previous authors to rely on the upper limit of Shannon's traditional formulation of channel capacity, because its limit does not always exist. The paper also includes proposing a new limit based formulation for the entropy of a regular language and proves that formulation to both exist and be equivalent to Shannon's original formulation (when it exists). Additionally, the proposed formulation is shown to equal an analogous but formally quite different notion of topological entropy from Symbolic Dynamics -- consequently also showing Shannon's original formulation to be equivalent to topological entropy. Surprisingly, the natural Jaccard-like entropy distance is trivial in most cases. Instead, the entropy sum distance metric is suggested, and shown to be granular in certain situations.
  • regular languages
  • channel capacity
  • entropy
  • Jaccard
  • symbolic dynamics


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. F. Blanchard, E. Glasner, S. Kolyada, and A. Maass. On Li-Yorke pairs. J. Reine Angew. Math., 547:51-68, 2002. Google Scholar
  2. M. Bodirsky, T. Gärtner, T. von Oertzen, and J. Schwinghammer. Efficiently computing the density of regular languages. In LATIN 2004: Theoretical informatics, volume 2976 of Lecture Notes in Comput. Sci., pages 262-270. Springer, Berlin, 2004. URL:
  3. T. Ceccherini-Silberstein, A. Machì, and F. Scarabotti. On the entropy of regular languages. Theoretical computer science, 307(1):93-102, 2003. Google Scholar
  4. C. Chan, M. Garofalakis, and R. Rastogi. Re-tree: an efficient index structure for regular expressions. The VLDB Journal—The International Journal on Very Large Data Bases, 12(2):102-119, 2003. Google Scholar
  5. C. Chang. Algorithm for the complexity of finite automata. 31st Workshop on Combinatorial Mathematics and Computation Theory, pages 216-220, 2014. Google Scholar
  6. N. Chomsky and G. Miller. Finite state languages. Information and Control, 1(2):91-112, 1958. URL:
  7. C. Cortes, M. Mohri, and A. Rastogi. On the computation of some standard distances between probabilistic automata. In Implementation and application of automata, volume 4094 of Lecture Notes in Comput. Sci., pages 137-149. Springer, Berlin, 2006. URL:
  8. C. Cortes, M. Mohri, A. Rastogi, and M. Riley. Efficient computation of the relative entropy of probabilistic automata. In LATIN 2006: Theoretical informatics, volume 3887 of Lecture Notes in Comput. Sci., pages 323-336. Springer, Berlin, 2006. URL:
  9. C. Cui, Z. Dang, T. Fischer, and O. Ibarra. Similarity in languages and programs. Theoretical Computer Science, 498:58-75, 2013. Google Scholar
  10. C. Cui, Z. Dang, T. Fischer, and O. Ibarra. Information rate of some classes of non-regular languages: an automata-theoretic approach (extended abstract). In Mathematical foundations of computer science 2014. Part I, volume 86343 of Lecture notes in Comput. Sci., pages 232-243. Springer, Heidelberg, 2014. Google Scholar
  11. Jürgen Dassow, Gema M. Martín, and Francisco J. Vico. A similarity measure for cyclic unary regular languages. Fundam. Inform., 96(1-2):71-88, 2009. URL:
  12. J. Francis. The QR transformation a unitary analogue to the LR transformation — part 1. The Computer Journal, 4(3):265-271, 1961. Google Scholar
  13. B. Hasselblatt and A. Katok. A first course in dynamics: With a panorama of recent developments. Cambridge University Press, New York, 2003. Google Scholar
  14. J. Hopcroft and J. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley Publishing Company, Inc., 1979. Google Scholar
  15. A. Kolmogorov. Entropy per unit time as a metric invariant of automorphisms. In Dokl. Akad. Nauk SSSR, volume 124, pages 754-755, 1959. Google Scholar
  16. J. Kozik. Conditional densities of regular languages. In Proceedings of the Second Workshop on Computational Logic and Applications (CLA 2004), volume 140 of Electron. Notes Theor. Comput. Sci., pages 67-79 (electronic). Elsevier, Amsterdam, 2005. URL:
  17. W. Kuich. On the entropy of context-free languages. Information and Control, 16(2):173-200, 1970. Google Scholar
  18. W. Li. On the relationship between complexity and entropy for Markov chains and regular languages. Complex systems, 5(4):381-399, 1991. Google Scholar
  19. D. Lind and B. Marcus. Symbolic dynamics and coding. Cambridge, 1995. Google Scholar
  20. J. Marklof and C. Ulcigrai. Lecture notes for dynamical systems and ergodic theory, 2015-2016.∼majm/DSET/index.html. Google Scholar
  21. M-J. Nederhof and G. Satta. Computation of distances for regular and context-free probabilistic languages. Theoret. Comput. Sci., 395(2-3):235-254, 2008. URL:
  22. A. Parker, K. Yancey, and M. Yancey. Regular language distance and entropy. arXiv, 602.07715, 2015. Google Scholar
  23. U. Rothblum. Expansion of sums of matrix powers. SIAM Review, 23:143-164, 1981. Google Scholar
  24. U. Rothblum. Chapter 9, nonnegative matrices and stochastic matrices. In Handbook of Linear Algebra. (eds: L. Hogben), Chapman and Hall / CRC, 2007. Google Scholar
  25. Arto Salomaa and Matti Soittola. Automata-Theoretic Aspects of Formal Power Series. Texts and Monographs in Computer Science. Springer, 1978. URL:
  26. F. Schneider and D. Borchmann. Topological entropy of formal languages. arXiv, 1507.03393, 2015. Google Scholar
  27. C. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379-423, 1948. Google Scholar
  28. J. Simonsen. On the computability of the topological entropy of subshifts. Discrete mathematics and theoretical computer science, 8(1):83-95, 2006. Google Scholar
  29. Y. Sinai. On the notion of entropy of a dynamical system. In Dokl Akad Nauk SSSR, volume 124, pages 768-771, 1959. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail