DROPS

Document

DOI: 10.4230/LIPIcs.AofA.2024.13

On the Number of Distinct Fringe Subtrees in Binary Search Trees

Authors: Stephan Wagner

Published in: LIPIcs, Volume 302, 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024)

Abstract

A fringe subtree of a rooted tree is a subtree that consists of a vertex and all its descendants. The number of distinct fringe subtrees in random trees has been studied by several authors, notably because of its connection to tree compaction algorithms. Here, we obtain a very precise result for binary search trees: it is shown that the number of distinct fringe subtrees in a binary search tree with n leaves is asymptotically equal to (c₁n)/(log n) for a constant c₁ ≈ 2.4071298335, both in expectation and with high probability. This was previously shown to be a lower bound, our main contribution is to prove a matching upper bound. The method is quite general and can also be applied to similar problems for other tree models.

Cite as

Stephan Wagner. On the Number of Distinct Fringe Subtrees in Binary Search Trees. In 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 302, pp. 13:1-13:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{wagner:LIPIcs.AofA.2024.13,
  author =	{Wagner, Stephan},
  title =	{{On the Number of Distinct Fringe Subtrees in Binary Search Trees}},
  booktitle =	{35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024)},
  pages =	{13:1--13:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-329-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{302},
  editor =	{Mailler, C\'{e}cile and Wild, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2024.13},
  URN =		{urn:nbn:de:0030-drops-204482},
  doi =		{10.4230/LIPIcs.AofA.2024.13},
  annote =	{Keywords: Fringe subtrees, binary search trees, tree compression, minimal DAG, asymptotics}
}

Document

DOI: 10.4230/LIPIcs.AofA.2024.25

Patricia’s Bad Distributions

Authors: Louigi Addario-Berry, Pat Morin, and Ralph Neininger

Published in: LIPIcs, Volume 302, 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024)

Abstract

The height of a random PATRICIA tree built from independent, identically distributed infinite binary strings with arbitrary diffuse probability distribution μ on {0,1}^ℕ is studied. We show that the expected height grows asymptotically sublinearly in the number of leaves for any such μ, but can be made to exceed any specific sublinear growth rate by choosing μ appropriately.

Cite as

Louigi Addario-Berry, Pat Morin, and Ralph Neininger. Patricia’s Bad Distributions. In 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 302, pp. 25:1-25:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{addarioberry_et_al:LIPIcs.AofA.2024.25,
  author =	{Addario-Berry, Louigi and Morin, Pat and Neininger, Ralph},
  title =	{{Patricia’s Bad Distributions}},
  booktitle =	{35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024)},
  pages =	{25:1--25:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-329-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{302},
  editor =	{Mailler, C\'{e}cile and Wild, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2024.25},
  URN =		{urn:nbn:de:0030-drops-204600},
  doi =		{10.4230/LIPIcs.AofA.2024.25},
  annote =	{Keywords: PATRICIA tree, random tree, height of tree, analysis of algorithms}
}

Document

DOI: 10.4230/LIPIcs.AofA.2020.15

Analysis of Lempel-Ziv'78 for Markov Sources

Authors: Philippe Jacquet and Wojciech Szpankowski

Published in: LIPIcs, Volume 159, 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)

Abstract

Lempel-Ziv'78 is one of the most popular data compression algorithms. Over the last few decades fascinating properties of LZ78 were uncovered. Among others, in 1995 we settled the Ziv conjecture by proving that for a memoryless source the number of LZ78 phrases satisfies the Central Limit Theorem (CLT). Since then the quest commenced to extend it to Markov sources. However, despite several attempts this problem is still open. The 1995 proof of the Ziv conjecture was based on two models: In the DST-model, the associated digital search tree (DST) is built over m independent strings. In the LZ-model a single string of length n is partitioned into variable length phrases such that the next phrase is not seen in the past as a phrase. The Ziv conjecture for memoryless source was settled by proving that both DST-model and the LZ-model are asymptotically equivalent. The main result of this paper shows that this is not the case for the LZ78 algorithm over Markov sources. In addition, we develop here a large deviation for the number of phrases in the LZ78 and give a precise asymptotic expression for the redundancy which is the excess of LZ78 code over the entropy of the source. We establish these findings using a combination of combinatorial and analytic tools. In particular, to handle the strong dependency between Markov phrases, we introduce and precisely analyze the so called tail symbol which is the first symbol of the next phrase in the LZ78 parsing.

Cite as

Philippe Jacquet and Wojciech Szpankowski. Analysis of Lempel-Ziv'78 for Markov Sources. In 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 159, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{jacquet_et_al:LIPIcs.AofA.2020.15,
  author =	{Jacquet, Philippe and Szpankowski, Wojciech},
  title =	{{Analysis of Lempel-Ziv'78 for Markov Sources}},
  booktitle =	{31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-147-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{159},
  editor =	{Drmota, Michael and Heuberger, Clemens},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2020.15},
  URN =		{urn:nbn:de:0030-drops-120459},
  doi =		{10.4230/LIPIcs.AofA.2020.15},
  annote =	{Keywords: Lempel-Ziv algorithm, digital search trees, depoissonization, analytic combinatorics, large deviations}
}

Document

DOI: 10.4230/LIPIcs.AofA.2020.16

Power-Law Degree Distribution in the Connected Component of a Duplication Graph

Authors: Philippe Jacquet, Krzysztof Turowski, and Wojciech Szpankowski

Published in: LIPIcs, Volume 159, 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)

Abstract

We study the partial duplication dynamic graph model, introduced by Bhan et al. in [Bhan et al., 2002] in which a newly arrived node selects randomly an existing node and connects with probability p to its neighbors. Such a dynamic network is widely considered to be a good model for various biological networks such as protein-protein interaction networks. This model is discussed in numerous publications with only a few recent rigorous results, especially for the degree distribution. Recently Jordan [Jordan, 2018] proved that for 0 < p < 1/e the degree distribution of the connected component is stationary with approximately a power law. In this paper we rigorously prove that the tail is indeed a true power law, that is, we show that the degree of a randomly selected node in the connected component decays like C/k^β where C an explicit constant and β ≠ 2 is a non-trivial solution of p^(β-2) + β - 3 = 0. This holds regardless of the structure of the initial graph, as long as it is connected and has at least two vertices. To establish this finding we apply analytic combinatorics tools, in particular Mellin transform and singularity analysis.

Cite as

Philippe Jacquet, Krzysztof Turowski, and Wojciech Szpankowski. Power-Law Degree Distribution in the Connected Component of a Duplication Graph. In 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 159, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{jacquet_et_al:LIPIcs.AofA.2020.16,
  author =	{Jacquet, Philippe and Turowski, Krzysztof and Szpankowski, Wojciech},
  title =	{{Power-Law Degree Distribution in the Connected Component of a Duplication Graph}},
  booktitle =	{31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-147-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{159},
  editor =	{Drmota, Michael and Heuberger, Clemens},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2020.16},
  URN =		{urn:nbn:de:0030-drops-120467},
  doi =		{10.4230/LIPIcs.AofA.2020.16},
  annote =	{Keywords: random graphs, pure duplication model, degree distribution, tail exponent, analytic combinatorics}
}

@InProceedings{jacquet_et_al:LIPIcs.AofA.2020.16,
  author =	{Jacquet, Philippe and Turowski, Krzysztof and Szpankowski, Wojciech},
  title =	{{Power-Law Degree Distribution in the Connected Component of a Duplication Graph}},
  booktitle =	{31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-147-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{159},
  editor =	{Drmota, Michael and Heuberger, Clemens},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2020.16},
  URN =		{urn:nbn:de:0030-drops-120467},
  doi =		{10.4230/LIPIcs.AofA.2020.16},
  annote =	{Keywords: random graphs, pure duplication model, degree distribution, tail exponent, analytic combinatorics}
}

Document

DOI: 10.4230/LIPIcs.AofA.2020.17

Hidden Words Statistics for Large Patterns

Authors: Svante Janson and Wojciech Szpankowski

Published in: LIPIcs, Volume 159, 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)

Abstract

We study here the so called subsequence pattern matching also known as hidden pattern matching in which one searches for a given pattern w of length m as a subsequence in a random text of length n. The quantity of interest is the number of occurrences of w as a subsequence (i.e., occurring in not necessarily consecutive text locations). This problem finds many applications from intrusion detection, to trace reconstruction, to deletion channel, and to DNA-based storage systems. In all of these applications, the pattern w is of variable length. To the best of our knowledge this problem was only tackled for a fixed length m=O(1) [P. Flajolet et al., 2006]. In our main result Theorem 5 we prove that for m=o(n^{1/3}) the number of subsequence occurrences is normally distributed. In addition, in Theorem 6 we show that under some constraints on the structure of w the asymptotic normality can be extended to m=o(√n). For a special pattern w consisting of the same symbol, we indicate that for m=o(n) the distribution of number of subsequences is either asymptotically normal or asymptotically log normal. We conjecture that this dichotomy is true for all patterns. We use Hoeffding’s projection method for U-statistics to prove our findings.

Cite as

Svante Janson and Wojciech Szpankowski. Hidden Words Statistics for Large Patterns. In 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 159, pp. 17:1-17:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{janson_et_al:LIPIcs.AofA.2020.17,
  author =	{Janson, Svante and Szpankowski, Wojciech},
  title =	{{Hidden Words Statistics for Large Patterns}},
  booktitle =	{31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-147-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{159},
  editor =	{Drmota, Michael and Heuberger, Clemens},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2020.17},
  URN =		{urn:nbn:de:0030-drops-120476},
  doi =		{10.4230/LIPIcs.AofA.2020.17},
  annote =	{Keywords: Hidden pattern matching, subsequences, probability, U-statistics, projection method}
}

5 Search Results for "Szpankowski, Wojciech"

On the Number of Distinct Fringe Subtrees in Binary Search Trees

Abstract

Cite as

Patricia’s Bad Distributions

Abstract

Cite as

Analysis of Lempel-Ziv'78 for Markov Sources

Abstract

Cite as

Power-Law Degree Distribution in the Connected Component of a Duplication Graph

Abstract

Cite as

Hidden Words Statistics for Large Patterns

Abstract

Cite as