5 Search Results for "Chen, Ke"


Document
Locality-Sensitive Bucketing Functions for the Edit Distance

Authors: Ke Chen and Mingfu Shao

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be (d₁, d₂)-sensitive if any two sequences within an edit distance of d₁ are mapped into at least one shared bucket, and any two sequences with distance at least d₂ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of (d₁,d₂) and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions.

Cite as

Ke Chen and Mingfu Shao. Locality-Sensitive Bucketing Functions for the Edit Distance. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.WABI.2022.22,
  author =	{Chen, Ke and Shao, Mingfu},
  title =	{{Locality-Sensitive Bucketing Functions for the Edit Distance}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{22:1--22:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.22},
  URN =		{urn:nbn:de:0030-drops-170563},
  doi =		{10.4230/LIPIcs.WABI.2022.22},
  annote =	{Keywords: Locality-sensitive hashing, locality-sensitive bucketing, long reads, embedding}
}
Document
Multiparty Selection

Authors: Ke Chen and Adrian Dumitrescu

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
Given a sequence A of n numbers and an integer (target) parameter 1 ≤ i ≤ n, the (exact) selection problem is that of finding the i-th smallest element in A. An element is said to be (i,j)-mediocre if it is neither among the top i nor among the bottom j elements of S. The approximate selection problem is that of finding an (i,j)-mediocre element for some given i,j; as such, this variant allows the algorithm to return any element in a prescribed range. In the first part, we revisit the selection problem in the two-party model introduced by Andrew Yao (1979) and then extend our study of exact selection to the multiparty model. In the second part, we deduce some communication complexity benefits that arise in approximate selection. In particular, we present a deterministic protocol for finding an approximate median among k players.

Cite as

Ke Chen and Adrian Dumitrescu. Multiparty Selection. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 42:1-42:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ISAAC.2020.42,
  author =	{Chen, Ke and Dumitrescu, Adrian},
  title =	{{Multiparty Selection}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{42:1--42:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.42},
  URN =		{urn:nbn:de:0030-drops-133867},
  doi =		{10.4230/LIPIcs.ISAAC.2020.42},
  annote =	{Keywords: approximate selection, mediocre element, comparison algorithm, i-th order statistic, tournaments, quantiles, communication complexity}
}
Document
Random Sampling and Size Estimation Over Cyclic Joins

Authors: Yu Chen and Ke Yi

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)


Abstract
Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [Surajit Chaudhuri et al., 1999] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN^ρ/OUT), where IN^ρ is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time.

Cite as

Yu Chen and Ke Yi. Random Sampling and Size Estimation Over Cyclic Joins. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICDT.2020.7,
  author =	{Chen, Yu and Yi, Ke},
  title =	{{Random Sampling and Size Estimation Over Cyclic Joins}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.7},
  URN =		{urn:nbn:de:0030-drops-119315},
  doi =		{10.4230/LIPIcs.ICDT.2020.7},
  annote =	{Keywords: Random sampling, joins, join size estimation}
}
Document
Resolving Infeasibility of Linear Systems: A Parameterized Approach

Authors: Alexander Göke, Lydia Mirabel Mendoza Cadena, and Matthias Mnich

Published in: LIPIcs, Volume 148, 14th International Symposium on Parameterized and Exact Computation (IPEC 2019)


Abstract
Deciding feasibility of large systems of linear equations and inequalities is one of the most fundamental algorithmic tasks. However, due to inaccuracies of the data or modeling errors, in practical applications one often faces linear systems that are infeasible. Extensive theoretical and practical methods have been proposed for post-infeasibility analysis of linear systems. This generally amounts to detecting a feasibility blocker of small size k, which is a set of equations and inequalities whose removal or perturbation from the large system of size m yields a feasible system. This motivates a parameterized approach towards post-infeasibility analysis, where we aim to find a feasibility blocker of size at most k in fixed-parameter time f(k)* m^{O(1)}. On the one hand, we establish parameterized intractability (W[1]-hardness) results even in very restricted settings. On the other hand, we develop fixed-parameter algorithms parameterized by the number of perturbed inequalities and the number of positive/negative right-hand sides. Our algorithms capture the case of Directed Feedback Arc Set, a fundamental parameterized problem whose fixed-parameter tractability was shown by Chen et al. (STOC 2008).

Cite as

Alexander Göke, Lydia Mirabel Mendoza Cadena, and Matthias Mnich. Resolving Infeasibility of Linear Systems: A Parameterized Approach. In 14th International Symposium on Parameterized and Exact Computation (IPEC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 148, pp. 17:1-17:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{goke_et_al:LIPIcs.IPEC.2019.17,
  author =	{G\"{o}ke, Alexander and Mendoza Cadena, Lydia Mirabel and Mnich, Matthias},
  title =	{{Resolving Infeasibility of Linear Systems: A Parameterized Approach}},
  booktitle =	{14th International Symposium on Parameterized and Exact Computation (IPEC 2019)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-129-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{148},
  editor =	{Jansen, Bart M. P. and Telle, Jan Arne},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2019.17},
  URN =		{urn:nbn:de:0030-drops-114787},
  doi =		{10.4230/LIPIcs.IPEC.2019.17},
  annote =	{Keywords: Infeasible subsystems, linear programming, fixed-parameter algorithms}
}
Document
On the Stretch Factor of Polygonal Chains

Authors: Ke Chen, Adrian Dumitrescu, Wolfgang Mulzer, and Csaba D. Tóth

Published in: LIPIcs, Volume 138, 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)


Abstract
Let P=(p_1, p_2, ..., p_n) be a polygonal chain. The stretch factor of P is the ratio between the total length of P and the distance of its endpoints, sum_{i = 1}^{n-1} |p_i p_{i+1}|/|p_1 p_n|. For a parameter c >= 1, we call P a c-chain if |p_ip_j|+|p_jp_k| <= c|p_ip_k|, for every triple (i,j,k), 1 <= i<j<k <= n. The stretch factor is a global property: it measures how close P is to a straight line, and it involves all the vertices of P; being a c-chain, on the other hand, is a fingerprint-property: it only depends on subsets of O(1) vertices of the chain. We investigate how the c-chain property influences the stretch factor in the plane: (i) we show that for every epsilon > 0, there is a noncrossing c-chain that has stretch factor Omega(n^{1/2-epsilon}), for sufficiently large constant c=c(epsilon); (ii) on the other hand, the stretch factor of a c-chain P is O(n^{1/2}), for every constant c >= 1, regardless of whether P is crossing or noncrossing; and (iii) we give a randomized algorithm that can determine, for a polygonal chain P in R^2 with n vertices, the minimum c >= 1 for which P is a c-chain in O(n^{2.5} polylog n) expected time and O(n log n) space.

Cite as

Ke Chen, Adrian Dumitrescu, Wolfgang Mulzer, and Csaba D. Tóth. On the Stretch Factor of Polygonal Chains. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 138, pp. 56:1-56:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.MFCS.2019.56,
  author =	{Chen, Ke and Dumitrescu, Adrian and Mulzer, Wolfgang and T\'{o}th, Csaba D.},
  title =	{{On the Stretch Factor of Polygonal Chains}},
  booktitle =	{44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)},
  pages =	{56:1--56:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-117-7},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{138},
  editor =	{Rossmanith, Peter and Heggernes, Pinar and Katoen, Joost-Pieter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2019.56},
  URN =		{urn:nbn:de:0030-drops-110005},
  doi =		{10.4230/LIPIcs.MFCS.2019.56},
  annote =	{Keywords: polygonal chain, vertex dilation, Koch curve, recursive construction}
}
  • Refine by Author
  • 3 Chen, Ke
  • 2 Dumitrescu, Adrian
  • 1 Chen, Yu
  • 1 Göke, Alexander
  • 1 Mendoza Cadena, Lydia Mirabel
  • Show More...

  • Refine by Classification
  • 1 Applied computing → Bioinformatics
  • 1 Applied computing → Computational biology
  • 1 Mathematics of computing → Paths and connectivity problems
  • 1 Theory of computation
  • 1 Theory of computation → Computational geometry
  • Show More...

  • Refine by Keyword
  • 1 Infeasible subsystems
  • 1 Koch curve
  • 1 Locality-sensitive hashing
  • 1 Random sampling
  • 1 approximate selection
  • Show More...

  • Refine by Type
  • 5 document

  • Refine by Publication Year
  • 2 2019
  • 2 2020
  • 1 2022

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail