Search Results

Documents authored by Eades, Patrick


Document
Exploiting New Properties of String Net Frequency for Efficient Computation

Authors: Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Knowing which strings in a massive text are significant - that is, which strings are common and distinct from other strings - is valuable for several applications, including text compression and tokenization. Frequency in itself is not helpful for significance, because the commonest strings are the shortest strings. A compelling alternative is net frequency, which has the property that strings with positive net frequency are of maximal length. However, net frequency remains relatively unexplored, and there is no prior art showing how to compute it efficiently. We first introduce a characteristic of net frequency that simplifies the original definition. With this, we study strings with positive net frequency in Fibonacci words. We then use our characteristic and solve two key problems related to net frequency. First, single-nf, how to compute the net frequency of a given string of length m, in an input text of length n over an alphabet size σ. Second, all-nf, given length-n input text, how to report every string of positive net frequency (and its net frequency). Our methods leverage suffix arrays, components of the Burrows-Wheeler transform, and solution to the coloured range listing problem. We show that, for both problems, our data structure has O(n) construction cost: with this structure, we solve single-nf in O(m + σ) time and all-nf in O(n) time. Experimentally, we find our method to be around 100 times faster than reasonable baselines for single-nf. For all-nf, our results show that, even with prior knowledge of the set of strings with positive net frequency, simply confirming that their net frequency is positive takes longer than with our purpose-designed method. All in all, we show that net frequency is a cogent method for identifying significant strings. We show how to calculate net frequency efficiently, and how to report efficiently the set of plausibly significant strings.

Cite as

Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel. Exploiting New Properties of String Net Frequency for Efficient Computation. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{guo_et_al:LIPIcs.CPM.2024.16,
  author =	{Guo, Peaker and Eades, Patrick and Wirth, Anthony and Zobel, Justin},
  title =	{{Exploiting New Properties of String Net Frequency for Efficient Computation}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.16},
  URN =		{urn:nbn:de:0030-drops-201265},
  doi =		{10.4230/LIPIcs.CPM.2024.16},
  annote =	{Keywords: Fibonacci words, suffix arrays, Burrows-Wheeler transform, LCP arrays, irreducible LCP values, coloured range listing}
}
Document
Trajectory Visibility

Authors: Patrick Eades, Ivor van der Hoog, Maarten Löffler, and Frank Staals

Published in: LIPIcs, Volume 162, 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020)


Abstract
We study the problem of testing whether there exists a time at which two entities moving along different piece-wise linear trajectories among polygonal obstacles are mutually visible. We study several variants, depending on whether or not the obstacles form a simple polygon, trajectories may intersect the polygon edges, and both or only one of the entities are moving. For constant complexity trajectories contained in a simple polygon with n vertices, we provide an 𝒪(n) time algorithm to test if there is a time at which the entities can see each other. If the polygon contains holes, we present an 𝒪(n log n) algorithm. We show that this is tight. We then consider storing the obstacles in a data structure, such that queries consisting of two line segments can be efficiently answered. We show that for all variants it is possible to answer queries in sublinear time using polynomial space and preprocessing time. As a critical intermediate step, we provide an efficient solution to a problem of independent interest: preprocess a convex polygon such that we can efficiently test intersection with a quadratic curve segment. If the obstacles form a simple polygon, this allows us to answer visibility queries in 𝒪(n³/4log³ n) time using 𝒪(nlog⁵ n) space. For more general obstacles the query time is 𝒪(log^k n), for a constant but large value k, using 𝒪(n^{3k}) space. We provide more efficient solutions when one of the entities remains stationary.

Cite as

Patrick Eades, Ivor van der Hoog, Maarten Löffler, and Frank Staals. Trajectory Visibility. In 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 162, pp. 23:1-23:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{eades_et_al:LIPIcs.SWAT.2020.23,
  author =	{Eades, Patrick and van der Hoog, Ivor and L\"{o}ffler, Maarten and Staals, Frank},
  title =	{{Trajectory Visibility}},
  booktitle =	{17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020)},
  pages =	{23:1--23:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-150-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{162},
  editor =	{Albers, Susanne},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2020.23},
  URN =		{urn:nbn:de:0030-drops-122701},
  doi =		{10.4230/LIPIcs.SWAT.2020.23},
  annote =	{Keywords: trajectories, visibility, data structures, semi-algebraic range searching}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail