15 Search Results for "Shibuya, Tetsuo"


Document
Linear-Space Subquadratic-Time String Alignment Algorithm for Arbitrary Scoring Matrices

Authors: Ryosuke Yamano and Tetsuo Shibuya

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Theoretically, the fastest algorithm by Crochemore et al. for computing the alignment of two given strings of size n over a constant alphabet takes O(n²/log n) time. The algorithm uses Lempel–Ziv parsing to divide the dynamic programming matrix into blocks and utilizes the repetitive structure. It is the only previously known subquadratic-time algorithm that can handle scoring matrices of arbitrary weights. However, this algorithm takes O(n²/log n) space, and reducing the space while preserving the time complexity has been an open problem for more than 20 years. We present a solution to this issue by achieving an O(n) space algorithm that maintains O(n²/log n) time. The classical refinement by Hirschberg reduces the space complexity of the textbook O(n²) algorithm to O(n) while preserving the quadratic time. However, applying this technique to the algorithm of Crochemore et al. has been considered challenging because their method requires O(n² / log n) space even when computing only the alignment score. Our modification enables the application of Hirschberg’s refinement, allowing traceback computation in O(n) space while preserving the O(n² / log n) overall time complexity. Our algorithm can be applied to both global and local string alignment problems.

Cite as

Ryosuke Yamano and Tetsuo Shibuya. Linear-Space Subquadratic-Time String Alignment Algorithm for Arbitrary Scoring Matrices. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{yamano_et_al:LIPIcs.WABI.2025.21,
  author =	{Yamano, Ryosuke and Shibuya, Tetsuo},
  title =	{{Linear-Space Subquadratic-Time String Alignment Algorithm for Arbitrary Scoring Matrices}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{21:1--21:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.21},
  URN =		{urn:nbn:de:0030-drops-239479},
  doi =		{10.4230/LIPIcs.WABI.2025.21},
  annote =	{Keywords: String alignment, dynamic programming, linear space algorithms}
}
Document
Invited Talk
We Are What We Index; a Primer for the Wheeler Graph Era (Invited Talk)

Authors: Ben Langmead

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Since the arrival of second-generation sequencing, we have needed to build indexes over reference sequences - e.g. genomes and transcriptomes - in order to solve read alignment and classification problems efficiently [Langmead et al., 2009; Li and Durbin, 2009; Li et al., 2009]. The rule has been: what we can index determines what we can do. When indexing strings, we can use methods like suffix arrays [Manber and Myers, 1993], the Burrows-Wheeler Transform (BWT) [Burrows and Wheeler, 1994] / FM Index [Ferragina and Manzini, 2000], or k-mer indexes [Marchet et al., 2021]. What if we want to index objects more complex than strings? A pangenome, for example, is a large collection of similar strings, e.g. the hundreds of assemblies that make up the Human Pangenome Reference [Liao et al., 2023] or all the bacteria in the Refseq database [Goldfarb et al., 2025]. We may wish to combine these strings into a multiple sequence alignment (MSA) or a graph first. Can we index those efficiently? In many useful cases the answer is "yes," but in others the answer is "no." The story of how we learned exactly when the answer is "yes" versus "no" unfolded through a sequence of insights. Here we review this story, eventually arriving at the definition of Wheeler graphs as discovered and formalized by Gagie, Manzini and Sirén [Gagie et al., 2017]. We will focus on indexes based on the BWT, since these (a) are lossless full-text indexes, (b) are widely used in practice [Langmead et al., 2009; Li and Durbin, 2009], and (c) form the theoretical throughline for all the indexing strategies on the path to Wheeler graphs. We will trace the BWT-based indexing story from the early days of the FM Index, though its step-by-step gobbling up of trees (XBW-transform [Ferragina et al., 2005]) and de Bruijn Graphs (BOSS representation [Bowe et al., 2012]), and to the eventual formalization of Wheeler graphs [Gagie et al., 2017]. Along the way, we will define and update our notions of what it means to track a consecutive range of elements in the structure, and what it means for an index to be efficient. We will also connect these notions to automata [Sipser, 1996], noting how the indexability of Wheeler graphs (also called Wheeler automata) is connected to the mechanics of how to efficiently represent and simulate a finite automaton [Alanko et al., 2021]. With this context, we can imagine improved indexes for the future of genomics and pangenomics. De Bruijn are extremely practical and are the most widely used among the non-string data structures that are also Wheeler graphs. But we might prefer other options. For example, de Bruijn graphs have the undesirable property that they usually encode not only the true longer-than-k substrings of the original text, but also "false" substrings that span repeats. Related to this, paths through the de Bruijn graph can "glue" substrings together that are horizontally distant in the MSA. Could other Wheeler graphs be practical alternatives to de Bruijn graphs? For instance, the original GCSA study by Sirén, Välimäki and Mäkinen proposed a way to convert a multiple alignment into an automaton that either is a Wheeler graph or can be made into one [Sirén et al., 2014]. This warrants further exploration, possibly with the help of improved tools for solving the NP-complete problem of recognizing whether a graph is a Wheeler graph [Chao et al., 2023]. The notion of BWT tunnels [Baier, 2018] gives another route: we can begin with a concatenated pangenome strings and compress it by identifying and collapsing BWT tunnels. This yields a Wheeler graph that is compressed like the de Bruijn graph, but without departing from the exact contents or coordinate systems of the original genomes. The future might need us to explore all these Wheeler-graph indexes, along with the also highly practical and always-improving world of indexes buiover collections of strings [Gagie et al., 2018].

Cite as

Ben Langmead. We Are What We Index; a Primer for the Wheeler Graph Era (Invited Talk). In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{langmead:LIPIcs.WABI.2025.2,
  author =	{Langmead, Ben},
  title =	{{We Are What We Index; a Primer for the Wheeler Graph Era}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{2:1--2:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.2},
  URN =		{urn:nbn:de:0030-drops-239288},
  doi =		{10.4230/LIPIcs.WABI.2025.2},
  annote =	{Keywords: Indexing, Burrows-Wheeler Transform}
}
Document
Research
Secure Compressed Suffix Arrays

Authors: Kunihiko Sadakane

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
This paper proposes a secure compressed suffix array, which is a data oblivious and compressed version of the suffix array used for finding substrings of a large string. Secure compressed suffix arrays can be used for indexing a large collection of strings containing personal information such as DNA data.

Cite as

Kunihiko Sadakane. Secure Compressed Suffix Arrays. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 13:1-13:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{sadakane:OASIcs.Grossi.13,
  author =	{Sadakane, Kunihiko},
  title =	{{Secure Compressed Suffix Arrays}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{13:1--13:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.13},
  URN =		{urn:nbn:de:0030-drops-238122},
  doi =		{10.4230/OASIcs.Grossi.13},
  annote =	{Keywords: suffix array, compression, encryption, oblivious algorithm, secure computation}
}
Document
Wheeler Graphs and Wheeler Languages

Authors: Nicola Cotumaccio, Giovanna D'Agostino, Daniel Gibney, Alberto Policriti, Nicola Prezza, and Sharma V. Thankachan

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
Suffix sorting stands at the core of the most efficient solutions for indexed pattern matching: the suffix tree, the suffix array, compressed indexes based on the Burrows-Wheeler transform, and so on. In [Gagie, Manzini, Sirén, TCS 2017] this concept was extended to labeled graphs, obtaining the rich class of Wheeler graphs. This work opened a very fruitful line of research, ultimately generating results able to bridge the fields of compressed data structures, graph theory, and regular language theory. In a Wheeler graph, nodes are sorted according to the alphabetic order of their incoming labels, propagating this order through pairs of equally-labeled edges. This apparently-simple definition makes it possible to solve on Wheeler graphs problems (including, but not limited to: compression, subpath queries, NFA equivalence, determinization, minimization) that on general labeled graphs are extremely hard to solve, and induces a rich structure in the class of regular languages (Wheeler languages) recognized by automata whose state transition is a Wheeler graph. The goal of this survey is to provide a summary of (and intuitions behind) the results on Wheeler graphs that appeared in the literature since their introduction, in addition to a discussion of interesting problems that are still open in the field.

Cite as

Nicola Cotumaccio, Giovanna D'Agostino, Daniel Gibney, Alberto Policriti, Nicola Prezza, and Sharma V. Thankachan. Wheeler Graphs and Wheeler Languages. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 12:1-12:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cotumaccio_et_al:OASIcs.Manzini.12,
  author =	{Cotumaccio, Nicola and D'Agostino, Giovanna and Gibney, Daniel and Policriti, Alberto and Prezza, Nicola and Thankachan, Sharma V.},
  title =	{{Wheeler Graphs and Wheeler Languages}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{12:1--12:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.12},
  URN =		{urn:nbn:de:0030-drops-239205},
  doi =		{10.4230/OASIcs.Manzini.12},
  annote =	{Keywords: Wheeler languages, Wheeler graphs, pattern matching, indexing, compressed data structures}
}
Document
Graph Indexing Beyond Wheeler Graphs

Authors: Jarno N. Alanko, Elena Biagi, Massimo Equi, Veli Mäkinen, Simon J. Puglisi, Nicola Rizzo, Kunihiko Sadakane, and Jouni Sirén

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
After the discovery of the FM index, which linked the Burrows-Wheeler transform (BWT) to pattern matching on strings, several contemporaneous strands of research began on indexing more complex structures with the BWT, such as tries, finite languages, de Bruijn graphs, and aligned sequences. These directions can now be viewed as culminating in the theory of Wheeler Graphs, but sometimes they go beyond. This chapter reviews the significant body of "proto Wheeler Graph" indexes, many of which exploit characteristics of their specific case to outperform Wheeler graphs, especially in practice.

Cite as

Jarno N. Alanko, Elena Biagi, Massimo Equi, Veli Mäkinen, Simon J. Puglisi, Nicola Rizzo, Kunihiko Sadakane, and Jouni Sirén. Graph Indexing Beyond Wheeler Graphs. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 13:1-13:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:OASIcs.Manzini.13,
  author =	{Alanko, Jarno N. and Biagi, Elena and Equi, Massimo and M\"{a}kinen, Veli and Puglisi, Simon J. and Rizzo, Nicola and Sadakane, Kunihiko and Sir\'{e}n, Jouni},
  title =	{{Graph Indexing Beyond Wheeler Graphs}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{13:1--13:29},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.13},
  URN =		{urn:nbn:de:0030-drops-239215},
  doi =		{10.4230/OASIcs.Manzini.13},
  annote =	{Keywords: indexing, compression, compressed data structures, string algorithms, pattern matching}
}
Document
Pangenome Graph Indexing via the Multidollar-BWT

Authors: Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
Indexing pangenome graphs is a major algorithmic challenge in computational pangenomics, a recent and active research field that seeks to use graphs as representations of multiple genomes. Since these graphs are constructed from whole genome sequences of a species population, they can become very large, making indexing one of the most challenging problems. In this paper, we propose gindex, a novel indexing approach to solve the Graph Pattern Matching Problem based on the multidollar-BWT. Specifically, gindex aims to find all occurrences of a pattern in a sequence-labeled graph by overcoming two main limitations of GCSA2, one of the most widely used graph indexes: handling queries of arbitrary length and scaling to large graphs without pruning any complex regions. Moreover, we show how a smart preprocessing step can optimize the use of multidollar-BWT to skip small redundant sub-patterns and enhance gindex’s querying capabilities. We demonstrate the effectiveness of our approach by comparing it to GCSA2 in terms of index construction and query time, using different preprocessing modes on three pangenome graphs: one built from Drosophila genomes and two produced by the Human Pangenome Reference Consortium. The results show that gindex can scale on human pangenome graphs - which GCSA2 cannot index using large amounts of RAM - with acceptable memory and time requirements. Moreover, gindex achieves fast query times, although not as fast as GCSA2, which may produce false positives.

Cite as

Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni. Pangenome Graph Indexing via the Multidollar-BWT. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cozzi_et_al:LIPIcs.SEA.2025.13,
  author =	{Cozzi, Davide and Riccardi, Brian and Denti, Luca and Ciccolella, Simone and Sadakane, Kunihiko and Bonizzoni, Paola},
  title =	{{Pangenome Graph Indexing via the Multidollar-BWT}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.13},
  URN =		{urn:nbn:de:0030-drops-232515},
  doi =		{10.4230/LIPIcs.SEA.2025.13},
  annote =	{Keywords: Multidollar-BWT, Graph Index, Graph Pattern Matching, Pangenome Graph}
}
Document
The Trie Measure, Revisited

Authors: Jarno N. Alanko, Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)


Abstract
In this paper, we study the following problem: given n subsets S₁, … , S_n of an integer universe U = {0,… , u-1}, having total cardinality N = ∑_{i = 1}ⁿ |S_i|, find a prefix-free encoding enc : U → {0,1}^+ minimizing the so-called trie measure, i.e., the total number of edges in the n binary tries T₁, … , T_n, where T_i is the trie packing the encoded integers {enc(x):x ∈ S_i}. We first observe that this problem is equivalent to that of merging u sets with the cheapest sequence of binary unions, a problem which in [Ghosh et al., ICDCS 2015] is shown to be NP-hard. Motivated by the hardness of the general problem, we focus on particular families of prefix-free encodings. We start by studying the fixed-length shifted encoding of [Gupta et al., Theoretical Computer Science 2007]. Given a parameter 0 ≤ a < u, this encoding sends each x ∈ U to (x + a) mod u, interpreted as a bit-string of log u bits. We develop the first efficient algorithms that find the value of a minimizing the trie measure when this encoding is used. Our two algorithms run in O(u + Nlog u) and O(Nlog² u) time, respectively. We proceed by studying ordered encodings (a.k.a. monotone or alphabetic), and describe an algorithm finding the optimal such encoding in O(N+u³) time. Within the same running time, we show how to compute the best shifted ordered encoding, provably no worse than both the optimal shifted and optimal ordered encodings. We provide implementations of our algorithms and discuss how these encodings perform in practice.

Cite as

Jarno N. Alanko, Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza. The Trie Measure, Revisited. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2025.19,
  author =	{Alanko, Jarno N. and Becker, Ruben and Cenzato, Davide and Gagie, Travis and Kim, Sung-Hwan and Kodric, Bojana and Prezza, Nicola},
  title =	{{The Trie Measure, Revisited}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.19},
  URN =		{urn:nbn:de:0030-drops-231135},
  doi =		{10.4230/LIPIcs.CPM.2025.19},
  annote =	{Keywords: Succinct data structures, degenerate strings, integer encoding}
}
Document
Smooth Sensitivity Revisited: Towards Optimality

Authors: Richard Hladík and Jakub Tětek

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)


Abstract
Smooth sensitivity is one of the most commonly used techniques for designing practical differentially private mechanisms. In this approach, one computes the smooth sensitivity of a given query q on the given input D and releases q(D) with noise added proportional to this smooth sensitivity. One question remains: what distribution should we pick the noise from? In this paper, we give a new class of distributions suitable for the use with smooth sensitivity, which we name the PolyPlace distribution. This distribution improves upon the state-of-the-art Student’s T distribution in terms of standard deviation by arbitrarily large factors, depending on a "smoothness parameter" γ, which one has to set in the smooth sensitivity framework. Moreover, our distribution is defined for a wider range of parameter γ, which can lead to significantly better performance. Furthermore, we prove that the PolyPlace distribution converges for γ → 0 to the Laplace distribution and so does its variance. This means that the Laplace mechanism is a limit special case of the PolyPlace mechanism. This implies that our mechanism is in a certain sense optimal for γ → 0.

Cite as

Richard Hladík and Jakub Tětek. Smooth Sensitivity Revisited: Towards Optimality. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 2:1-2:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{hladik_et_al:LIPIcs.FORC.2025.2,
  author =	{Hlad{\'\i}k, Richard and T\v{e}tek, Jakub},
  title =	{{Smooth Sensitivity Revisited: Towards Optimality}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{2:1--2:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.2},
  URN =		{urn:nbn:de:0030-drops-231292},
  doi =		{10.4230/LIPIcs.FORC.2025.2},
  annote =	{Keywords: differential privacy, smooth sensitivity}
}
Document
Debiasing Functions of Private Statistics in Postprocessing

Authors: Flavio Calmon, Elbert Du, Cynthia Dwork, Brian Finley, and Grigory Franguridi

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)


Abstract
Given a differentially private unbiased estimate q̃ = q(D) +ν of a statistic q(D), we wish to obtain unbiased estimates of functions of q(D), such as 1/q(D), solely through post-processing of q̃, with no further access to the confidential dataset D. To this end, we adapt the deconvolution method used for unbiased estimation in the statistical literature, deriving unbiased estimators for a broad family of twice-differentiable functions - those that are tempered distributions - when the privacy-preserving noise ν is drawn from the Laplace distribution (Dwork et al., 2006). We further extend this technique to functions other than tempered distributions, deriving approximately optimal estimators that are unbiased for values in a user-specified interval (possibly extending to ± ∞). We use these results to derive an unbiased estimator for private means when the size n of the dataset is not publicly known. In a numerical application, we find that a mechanism that uses our estimator to return an unbiased sample size and mean outperforms a mechanism that instead uses the previously known unbiased privacy mechanism for such means (Kamath et al., 2023). We also apply our estimators to develop unbiased transformation mechanisms for per-record differential privacy, a privacy concept in which the privacy guarantee is a public function of a record’s value (Seeman et al., 2024). Our mechanisms provide stronger privacy guarantees than those in prior work (Finley et al., 2024) by using Laplace, rather than Gaussian, noise. Finally, using a different approach, we go beyond Laplace noise by deriving unbiased estimators for polynomials under the weak condition that the noise distribution has sufficiently many moments.

Cite as

Flavio Calmon, Elbert Du, Cynthia Dwork, Brian Finley, and Grigory Franguridi. Debiasing Functions of Private Statistics in Postprocessing. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{calmon_et_al:LIPIcs.FORC.2025.17,
  author =	{Calmon, Flavio and Du, Elbert and Dwork, Cynthia and Finley, Brian and Franguridi, Grigory},
  title =	{{Debiasing Functions of Private Statistics in Postprocessing}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.17},
  URN =		{urn:nbn:de:0030-drops-231449},
  doi =		{10.4230/LIPIcs.FORC.2025.17},
  annote =	{Keywords: Differential privacy, deconvolution, unbiasedness}
}
Document
Cycle Counting Under Local Differential Privacy for Degeneracy-Bounded Graphs

Authors: Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya

Published in: LIPIcs, Volume 327, 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)


Abstract
We propose an algorithm for counting the number of cycles under local differential privacy for degeneracy-bounded input graphs. Numerous studies have focused on counting the number of triangles under the privacy notion, demonstrating that the expected 𝓁₂-error of these algorithms is Ω(n^{1.5}), where n is the number of nodes in the graph. When parameterized by the number of cycles of length four (C₄), the best existing triangle counting algorithm has an error of O(n^{1.5} + √C₄) = O(n²). In this paper, we introduce an algorithm with an expected 𝓁₂-error of O(δ^1.5 n^0.5 + δ^0.5 d_max^0.5 n^0.5), where δ is the degeneracy and d_{max} is the maximum degree of the graph. For degeneracy-bounded graphs (δ ∈ Θ(1)) commonly found in practical social networks, our algorithm achieves an expected 𝓁₂-error of O(d_{max}^{0.5} n^{0.5}) = O(n). Our algorithm’s core idea is a precise count of triangles following a preprocessing step that approximately sorts the degree of all nodes. This approach can be extended to approximate the number of cycles of length k, maintaining a similar 𝓁₂-error, namely O(δ^{(k-2)/2} d_max^0.5 n^{(k-2)/2} + δ^{k/2} n^{(k-2)/2}) or O(d_max^0.5 n^{(k-2)/2}) = O(n^{(k-1)/2}) for degeneracy-bounded graphs.

Cite as

Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya. Cycle Counting Under Local Differential Privacy for Degeneracy-Bounded Graphs. In 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 327, pp. 49:1-49:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{hillebrand_et_al:LIPIcs.STACS.2025.49,
  author =	{Hillebrand, Quentin and Suppakitpaisarn, Vorapong and Shibuya, Tetsuo},
  title =	{{Cycle Counting Under Local Differential Privacy for Degeneracy-Bounded Graphs}},
  booktitle =	{42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)},
  pages =	{49:1--49:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-365-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{327},
  editor =	{Beyersdorff, Olaf and Pilipczuk, Micha{\l} and Pimentel, Elaine and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2025.49},
  URN =		{urn:nbn:de:0030-drops-228748},
  doi =		{10.4230/LIPIcs.STACS.2025.49},
  annote =	{Keywords: Differential privacy, triangle counting, degeneracy, arboricity, graph theory, parameterized accuracy}
}
Document
Recognition and Proper Coloring of Unit Segment Intersection Graphs

Authors: Robert D. Barish and Tetsuo Shibuya

Published in: LIPIcs, Volume 294, 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024)


Abstract
In this work, we concern ourselves with the fine-grained complexity of recognition and proper coloring problems on highly restricted classes of geometric intersection graphs of "thin" objects (i.e., objects with unbounded aspect ratios). As a point of motivation, we remark that there has been significant interest in finding algorithmic lower bounds for classic decision and optimization problems on these types of graphs, as they appear to escape the net of known planar or geometric separator theorems for "fat" objects (i.e., objects with bounded aspect ratios). In particular, letting n be the order of a geometric intersection graph, and assuming a geometric ply bound, per what is known as the "square root phenomenon", these separator theorems often imply the existence of 𝒪(2^√n) algorithms for problems ranging from finding proper colorings to finding Hamiltonian cycles. However, in contrast, it is known for instance that no 2^o(n) time algorithm can exist under the Exponential Time Hypothesis (ETH) for proper 6-coloring intersection graphs of line segments embedded in the plane (Biró et. al.; J. Comput. Geom. 9(2); pp. 47-80; 2018). We begin by establishing algorithmic lower bounds for proper k-coloring and recognition problems of intersection graphs of line segments embedded in the plane under the most stringent constraints possible that allow either problem to be non-trivial. In particular, we consider the class UNIT-PURE-k-DIR of unit segment geometric intersection graphs, in which segments are constrained to lie in at most k directions in the plane, and no two parallel segments are permitted to intersect. Here, under the ETH, we show for every k ≥ 3 that no 2^o(√{n/k}) time algorithm can exist for either recognizing or proper k-coloring UNIT-PURE-k-DIR graphs of order n. In addition, for every k ≥ 4, we establish the same algorithmic lower bound under the ETH for the problem of proper (k-1)-coloring UNIT-PURE-k-DIR graphs when provided a list of segment coordinates specified using 𝒪(n ⋅ k) bits witnessing graph class membership. As a consequence of our approach, we are also able to show that the problem of properly 3-coloring an arbitrary graph on m edges can be reduced in 𝒪(m²) time to the problem of properly (k-1)-coloring a UNIT-PURE-k-DIR graph. Finally, we consider a slightly less constrained class of geometric intersection graphs of lines (of unbounded length) in which line-line intersections must occur on any one of (r = 3) parallel planes in ℝ³. In this context, for every k ≥ 3, we show that no 2^o(n/k) time algorithm can exist for proper k-coloring these graphs unless the ETH is false.

Cite as

Robert D. Barish and Tetsuo Shibuya. Recognition and Proper Coloring of Unit Segment Intersection Graphs. In 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 294, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{barish_et_al:LIPIcs.SWAT.2024.5,
  author =	{Barish, Robert D. and Shibuya, Tetsuo},
  title =	{{Recognition and Proper Coloring of Unit Segment Intersection Graphs}},
  booktitle =	{19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-318-8},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{294},
  editor =	{Bodlaender, Hans L.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2024.5},
  URN =		{urn:nbn:de:0030-drops-200452},
  doi =		{10.4230/LIPIcs.SWAT.2024.5},
  annote =	{Keywords: graph class recognition, proper coloring, geometric intersection graph, segment intersection graph, fine-grained complexity, Exponential Time Hypothesis}
}
Document
Invited Talk
Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk)

Authors: Sharma V. Thankachan

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
In the past two decades, we have witnessed the design of various compact data structures for pattern matching over an indexed text [Navarro, 2016]. Popular indexes like the FM-index [Paolo Ferragina and Giovanni Manzini, 2005], compressed suffix arrays/trees [Roberto Grossi and Jeffrey Scott Vitter, 2005; Kunihiko Sadakane, 2007], the recent r-index [Travis Gagie et al., 2020; Takaaki Nishimoto and Yasuo Tabei, 2021], etc., capture the key functionalities of classic suffix arrays/trees [Udi Manber and Eugene W. Myers, 1993; Peter Weiner, 1973] in compact space. Mostly, they rely on the Burrows-Wheeler Transform (BWT) and its associated operations [Burrows and Wheeler, 1994]. However, compactly encoding some advanced suffix tree (ST) variants, like parameterized ST [Brenda S. Baker, 1993; S. Rao Kosaraju, 1995; Juan Mendivelso et al., 2020], order-isomorphic/preserving ST [Maxime Crochemore et al., 2016], two-dimensional ST [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998], etc. [Sung Gwan Park et al., 2019; Tetsuo Shibuya, 2000]- collectively known as suffix trees with missing suffix links [Richard Cole and Ramesh Hariharan, 2003], has been challenging. The previous techniques are not easily extendable because these variants do not hold some structural properties of the standard ST that enable compression. However, some limited progress has been made in these directions recently [Arnab Ganguly et al., 2017; Travis Gagie et al., 2017; Gianni Decaroli et al., 2017; Dhrumil Patel and Rahul Shah, 2021; Arnab Ganguly et al., 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Arnab Ganguly et al., 2017; Arnab Ganguly et al., 2022; Arnab Ganguly et al., 2021]. This talk will briefly survey them and highlight some interesting open problems.

Cite as

Sharma V. Thankachan. Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{thankachan:LIPIcs.CPM.2022.3,
  author =	{Thankachan, Sharma V.},
  title =	{{Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc.}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{3:1--3:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.3},
  URN =		{urn:nbn:de:0030-drops-161300},
  doi =		{10.4230/LIPIcs.CPM.2022.3},
  annote =	{Keywords: Text Indexing, Suffix Trees, String Matching}
}
Document
Compression of Multiple k-Mer Sets by Iterative SPSS Decomposition

Authors: Kazushi Kitaya and Tetsuo Shibuya

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)


Abstract
A set of k-mers is used in many bioinformatics tasks, and much work has been done on methods to efficiently represent or compress a single set of k-mers. However, methods for compressing multiple k-mer sets have been less studied in spite of their obvious benefits for researchers and genome-related database maintainers. This paper proposes an algorithm to compress multiple k-mer sets, which works by iteratively splitting SPSS (spectrum-preserving string sets). In experiments with 3292 k-mer sets constructed from E. coli whole-genome sequencing data and 2555 k-mer sets constructed from human RNA-Seq data, the proposed algorithm could reduce the compressed file sizes by 34.7% and 13.2% respectively compared to one of the state-of-the-art colored de Bruijn graph representations. Also, our method used less memory than the colored de Bruijn graph method. This paper also introduces various methods to make the compression algorithm efficient in terms of time and memory, one of which is a parallelizable small-weight SPSS construction algorithm.

Cite as

Kazushi Kitaya and Tetsuo Shibuya. Compression of Multiple k-Mer Sets by Iterative SPSS Decomposition. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 12:1-12:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{kitaya_et_al:LIPIcs.WABI.2021.12,
  author =	{Kitaya, Kazushi and Shibuya, Tetsuo},
  title =	{{Compression of Multiple k-Mer Sets by Iterative SPSS Decomposition}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{12:1--12:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.12},
  URN =		{urn:nbn:de:0030-drops-143659},
  doi =		{10.4230/LIPIcs.WABI.2021.12},
  annote =	{Keywords: sequencing data, k-mer, de Bruijn graph, compression, colored de Bruijn graph}
}
Document
Wear Leveling Revisited

Authors: Taku Onodera and Tetsuo Shibuya

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
Wear leveling - a technology designed to balance the write counts among memory cells regardless of the requested accesses - is vital in prolonging the lifetime of certain computer memory devices, especially the type of next-generation non-volatile memory, known as phase change memory (PCM). Although researchers have been working extensively on wear leveling, almost all existing studies mainly focus on the practical aspects and lack rigorous mathematical analyses. The lack of theory is particularly problematic for security-critical applications. We address this issue by revisiting wear leveling from a theoretical perspective. First, we completely determine the problem parameter regime for which Security Refresh - one of the most well-known existing wear leveling schemes for PCM - works effectively by providing a positive result and a matching negative result. In particular, Security Refresh is not competitive for the practically relevant regime of large-scale memory. Then, we propose a novel scheme that achieves better lifetime, time/space overhead, and wear-free space for the relevant regime not covered by Security Refresh. Unlike existing studies, we give rigorous theoretical lifetime analyses, which is necessary to assess and control the security risk.

Cite as

Taku Onodera and Tetsuo Shibuya. Wear Leveling Revisited. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 65:1-65:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{onodera_et_al:LIPIcs.ISAAC.2020.65,
  author =	{Onodera, Taku and Shibuya, Tetsuo},
  title =	{{Wear Leveling Revisited}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{65:1--65:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.65},
  URN =		{urn:nbn:de:0030-drops-134092},
  doi =		{10.4230/LIPIcs.ISAAC.2020.65},
  annote =	{Keywords: Wear leveling, Randomized algorithm, Non-volatile memory}
}
Document
Succinct Oblivious RAM

Authors: Taku Onodera and Tetsuo Shibuya

Published in: LIPIcs, Volume 96, 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)


Abstract
As online storage services become increasingly common, it is important that users' private information is protected from database access pattern analyses. Oblivious RAM (ORAM) is a cryptographic primitive that enables users to perform arbitrary database accesses without revealing any information about the access pattern to the server. Previous ORAM studies focused mostly on reducing the access overhead. Consequently, the access overhead of the state-of-the-art ORAM constructions are almost at practical levels in certain application scenarios such as secure processors. However, we assume that the server space usage could become a new important issue in the coming big-data era. To enable large-scale computation in security-aware settings, it is necessary to rethink the ORAM server space cost using big-data standards. In this paper, we introduce "succinctness" as a theoretically tractable and practically relevant criterion of the ORAM server space efficiency in the big-data era. We, then, propose two succinct ORAM constructions that also exhibit state-of-the-art performance in terms of the bandwidth blowup and the user space. We also give non-asymptotic analyses and simulation results which indicate that the proposed ORAM constructions are practically effective.

Cite as

Taku Onodera and Tetsuo Shibuya. Succinct Oblivious RAM. In 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 96, pp. 52:1-52:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{onodera_et_al:LIPIcs.STACS.2018.52,
  author =	{Onodera, Taku and Shibuya, Tetsuo},
  title =	{{Succinct Oblivious RAM}},
  booktitle =	{35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)},
  pages =	{52:1--52:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-062-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{96},
  editor =	{Niedermeier, Rolf and Vall\'{e}e, Brigitte},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2018.52},
  URN =		{urn:nbn:de:0030-drops-85014},
  doi =		{10.4230/LIPIcs.STACS.2018.52},
  annote =	{Keywords: Oblivious RAM, Succinct data structure, Balls-into-bins}
}
  • Refine by Type
  • 15 Document/PDF
  • 10 Document/HTML

  • Refine by Publication Year
  • 10 2025
  • 1 2024
  • 1 2022
  • 1 2021
  • 1 2020
  • Show More...

  • Refine by Author
  • 6 Shibuya, Tetsuo
  • 3 Sadakane, Kunihiko
  • 2 Alanko, Jarno N.
  • 2 Onodera, Taku
  • 2 Prezza, Nicola
  • Show More...

  • Refine by Series/Journal
  • 12 LIPIcs
  • 3 OASIcs

  • Refine by Classification
  • 3 Theory of computation → Data structures design and analysis
  • 3 Theory of computation → Graph algorithms analysis
  • 3 Theory of computation → Pattern matching
  • 2 Security and privacy → Privacy-preserving protocols
  • 2 Theory of computation → Design and analysis of algorithms
  • Show More...

  • Refine by Keyword
  • 3 compression
  • 2 Differential privacy
  • 2 compressed data structures
  • 2 indexing
  • 2 pattern matching
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail