Reordering a Tree According to an Order on Its Leaves

Authors Laurent Bulteau , Philippe Gambette , Olga Seminck



PDF
Thumbnail PDF

File

LIPIcs.CPM.2022.24.pdf
  • Filesize: 1.36 MB
  • 15 pages

Document Identifiers

Author Details

Laurent Bulteau
  • LIGM, Université Gustave Eiffel & CNRS, Champs-sur-Marne, France
Philippe Gambette
  • LIGM, Université Gustave Eiffel & CNRS, Champs-sur-Marne, France
Olga Seminck
  • Lattice (Langues, Textes, Traitements informatiques, Cognition), CNRS & ENS/PSL & Université Sorbonne nouvelle, France

Cite As Get BibTex

Laurent Bulteau, Philippe Gambette, and Olga Seminck. Reordering a Tree According to an Order on Its Leaves. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 24:1-24:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.CPM.2022.24

Abstract

In this article, we study two problems consisting in reordering a tree to fit with an order on its leaves provided as input, which were earlier introduced in the context of phylogenetic tree comparison for bioinformatics, OTCM and OTDE. The first problem consists in finding an order which minimizes the number of inversions with an input order on the leaves, while the second one consists in removing the minimum number of leaves from the tree to make it consistent with the input order on the remaining leaves. We show that both problems are NP-complete when the maximum degree is not bounded, as well as a problem on tree alignment, answering two questions opened in 2010 by Henning Fernau, Michael Kaufmann and Mathias Poths. We provide a polynomial-time algorithm for OTDE in the case where the maximum degree is bounded by a constant and an FPT algorithm in a parameter lower than the number of leaves to delete. Our results have practical interest not only for bioinformatics but also for digital humanities to evaluate, for example, the consistency of the dendrogram obtained from a hierarchical clustering algorithm with a chronological ordering of its leaves. We explore the possibilities of practical use of our results both on trees obtained by clustering the literary works of French authors and on simulated data, using implementations of our algorithms in Python.

Subject Classification

ACM Subject Classification
  • Theory of computation → Parameterized complexity and exact algorithms
Keywords
  • tree
  • clustering
  • order
  • permutation
  • inversions
  • FPT algorithm
  • NP-hardness
  • tree drawing
  • OTCM
  • OTDE
  • TTDE

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mukul S Bansal, Wen-Chieh Chang, Oliver Eulenstein, and David Fernández-Baca. Generalized binary tanglegrams: Algorithms and applications. In International Conference on Bioinformatics and Computational Biology, pages 114-125. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-00727-9_13.
  2. Ziv Bar-Joseph, Erik D. Demaine, David K. Gifford, Nathan Srebro, Angèle M. Hamel, and Tommi S. Jaakkola. K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics, 19(9):1070-1078, 2003. URL: https://doi.org/10.1093/bioinformatics/btg030.
  3. Ziv Bar-Joseph, David K Gifford, and Tommi S Jaakkola. Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(suppl 1):S22-S29, 2001. URL: https://doi.org/10.1093/bioinformatics/17.suppl_1.S22.
  4. Ulrik Brandes. Optimal leaf ordering of complete binary trees. Journal of Discrete Algorithms, 5(3):546-552, 2007. URL: https://doi.org/10.1016/j.jda.2006.09.003.
  5. Jianer Chen, Iyad A. Kanj, and Ge Xia. Improved upper bounds for vertex cover. Theoretical Computer Science, 411(40):3736-3756, 2010. URL: https://doi.org/10.1016/j.tcs.2010.06.026.
  6. Tim Dwyer and Falk Schreiber. Optimal leaf ordering for two and a half dimensional phylogenetic tree visualisation. In APVis '04: Proceedings of the 2004 Australasian symposium on Information Visualisation, volume 35, pages 109-115, 2004. URL: https://doi.org/10.5555/1082101.1082114.
  7. Denise Earle and Catherine B. Hurley. Advances in dendrogram seriation for application to visualization. Journal of Computational and Graphical Statistics, 24(1):1-25, 2015. URL: https://doi.org/10.1080/10618600.2013.874295.
  8. Maciej Eder, Jan Rybicki, and Mike Kestemont. Stylometry with R: a package for computational text analysis. R Journal, 8(1):107-121, 2016. URL: https://journal.r-project.org/archive/2016/RJ-2016-007/index.html.
  9. Henning Fernau, Michael Kaufmann, and Mathias Poths. Comparing trees via crossing minimization. In International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 457-469. Springer, 2005. URL: https://doi.org/10.1007/11590156_37.
  10. Henning Fernau, Michael Kaufmann, and Mathias Poths. Comparing trees via crossing minimization. Journal of Computer and System Sciences, 76(7):593-608, 2010. URL: https://doi.org/10.1016/j.jcss.2009.10.014.
  11. Philippe Gambette, Olga Seminck, Dominique Legallois, and Thierry Poibeau. Evaluating hierarchical clustering methods for corpora with chronological order. In EADH2021: Interdisciplinary Perspectives on Data. Second International Conference of the European Association for Digital Humanities, Krasnoyarsk, Russia, September 2021. EADH. URL: https://hal.archives-ouvertes.fr/hal-03341803.
  12. Michael Hahsler, Kurt Hornik, and Christian Buchta. Getting things in order: an introduction to the R package seriation. Journal of Statistical Software, 25(3):1-34, 2008. URL: https://doi.org/10.18637/jss.v025.i03.
  13. Richard M Karp. Reducibility among combinatorial problems. In Complexity of computer computations, pages 85-103. Springer, 1972. Google Scholar
  14. Cyril Labbé and Dominique Labbé. Existe-t-il un genre épistolaire? Hugo, Flaubert et Maupassant. In Nouvelles Journées de l'ERLA, pages 53-85. L'Harmattan, 2013. Google Scholar
  15. Jean-Marc Leblanc. Analyses lexicométriques des vœux présidentiels. ISTE Group, 2016. Google Scholar
  16. Bojan Mohar. Face covers and the genus problem for apex graphs. Journal of Combinatorial Theory, Series B, 82(1):102-117, 2001. URL: https://doi.org/10.1006/jctb.2000.2026.
  17. Hermann Moisl. How to visualize high-dimensional data: a roadmap. Journal of Data Mining & Digital Humanities, 2020. URL: https://doi.org/10.46298/jdmdh.5594.
  18. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  19. Ryo Sakai, Raf Winand, Toni Verbeiren, Andrew Vande Moere, and Jan Aerts. dendsort: modular leaf ordering methods for dendrogram representations in R. F1000Research, 3, 2014. URL: https://doi.org/10.12688/f1000research.4784.1.
  20. Olga Seminck, Philippe Gambette, Dominique Legallois, and Thierry Poibeau. The corpus for idiolectal research (CIDRE). Journal of Open Humanities Data, 7:15, 2021. URL: https://doi.org/10.5334/johd.42.
  21. Olga Seminck, Philippe Gambette, Dominique Legallois, and Thierry Poibeau. The evolution of the idiolect over the lifetime: A quantitative and qualitative study on French 19superscriptth century literature, 2022. Under review. Google Scholar
  22. Balaji Venkatachalam, Jim Apple, Katherine St John, and Dan Gusfield. Untangling tanglegrams: comparing trees by their drawings. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(4):588-597, 2010. URL: https://doi.org/10.1109/TCBB.2010.57.
  23. Magnus Wahlström. Algorithms, measures and upper bounds for satisfiability and related problems. PhD thesis, Department of Computer and Information Science, Linköpings universitet, 2007. URL: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-8714.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail