Document

APPROX

**Published in:** LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. We also explore a number of variants where additional constraints are imposed on the first-stage decisions, specifically matroid and multi-knapsack constraints.
Our eventual goal is to provide results for supplier problems in the most general distributional setting, where there is only black-box access to the underlying distribution. To that end, we follow a two-step approach. First, we develop algorithms for a restricted version of each problem, in which all possible scenarios are explicitly provided; second, we employ a novel scenario-discarding variant of the standard Sample Average Approximation (SAA) method, in which we crucially exploit properties of the restricted-case algorithms. We finally note that the scenario-discarding modification to the SAA method is necessary in order to optimize over the radius.

Brian Brubach, Nathaniel Grammel, David G. Harris, Aravind Srinivasan, Leonidas Tsepenekas, and Anil Vullikanti. Approximating Two-Stage Stochastic Supplier Problems. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 23:1-23:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.APPROX/RANDOM.2021.23, author = {Brubach, Brian and Grammel, Nathaniel and Harris, David G. and Srinivasan, Aravind and Tsepenekas, Leonidas and Vullikanti, Anil}, title = {{Approximating Two-Stage Stochastic Supplier Problems}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)}, pages = {23:1--23:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-207-5}, ISSN = {1868-8969}, year = {2021}, volume = {207}, editor = {Wootters, Mary and Sanit\`{a}, Laura}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.23}, URN = {urn:nbn:de:0030-drops-147163}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2021.23}, annote = {Keywords: Approximation Algorithms, Stochastic Optimization, Two-Stage Recourse Model, Clustering Problems, Knapsack Supplier} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

We present a new approach to approximating the Maximum Duo-Preservation String Mapping Problem (MPSM) based on massaging the constraints into a tractable matching problem. MPSM was introduced in Chen, Chen, Samatova, Peng, Wang, and Tang [Chen et al., 2014] as the complement to the well-studied Minimum Common String Partition problem (MCSP). Prior work also considers the k-MPSM and k-MCSP variants in which each letter occurs at most k times in each string. The authors of [Chen et al., 2014] showed a k^2-appoximation for k >= 3 and 2-approximation for k = 2. Boria, Kurpisz, Leppänen, and Mastrolilli [Boria et al., 2014] gave a 4-approximation independent of k and showed that even 2-MPSM is APX-Hard. A series of improvements led to the current best bounds of a (2 + epsilon)-approximation for any epsilon > 0 in n^{O(1/epsilon)} time for strings of length n and a 2.67-approximation running in O(n^2) time, both by Dudek, Gawrychowski, and Ostropolski-Nalewaja [Dudek et al., 2017]. Here, we show that a 2.67-approximation can surprisingly be achieved in O(n) time for alphabets of constant size and O(n + alpha^7) for alphabets of size alpha.
Recently, Mehrabi [Mehrabi, 2017] introduced the more general weighted variant, Maximum Weight Duo-Preservation String Mapping (MWPSM) and provided a 6-approximation. Our approach gives a 2.67-approximation to this problem running in O(n^3) time. This approach can also find an 8/(3(1-epsilon))-approximation to MWPSM for any epsilon > 0 in O(n^2 epsilon^{-1} lg{epsilon^{-1}}) time using the approximate weighted matching algorithm of Duan and Pettie [Duan and Pettie, 2014].
Finally, we introduce the first streaming algorithm for MPSM. We show that a single pass suffices to find a 4-approximation on the size of an optimal solution using only O(alpha^2 lg{n}) space.

Brian Brubach. Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{brubach:LIPIcs.CPM.2018.5, author = {Brubach, Brian}, title = {{Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {5:1--5:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.5}, URN = {urn:nbn:de:0030-drops-87066}, doi = {10.4230/LIPIcs.CPM.2018.5}, annote = {Keywords: approximation algorithm, maximum duo-preservation string mapping, minimum common string partition, string comparison, streaming algorithm, comparative genomics} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

The classical Four Russians speedup for computing edit distance (a.k.a. Levenshtein distance), due to Masek and Paterson [Masek and Paterson, 1980], involves partitioning the dynamic programming table into k-by-k square blocks and generating a lookup table in O(psi^{2k} k^2 |Sigma|^{2k}) time and O(psi^{2k} k |Sigma|^{2k}) space for block size k, where psi depends on the cost function (for unit costs psi = 3) and |Sigma| is the size of the alphabet. We show that the O(psi^{2k} k^2) and O(psi^{2k} k) factors can be improved to O(k^2 lg{k}) time and O(k^2) space. Thus, we improve the time and space complexity of that aspect compared to Masek and Paterson [Masek and Paterson, 1980] and remove the dependence on psi.
We further show that for certain problems the O(|Sigma|^{2k}) factor can also be reduced. Using this technique, we show a new algorithm for the fundamental problem of one-against-many banded alignment. In particular, comparing one string of length m to n other strings of length m with maximum distance d can be performed in O(n m + m d^2 lg{d} + n d^3) time. When d is reasonably small, this approaches or meets the current best theoretic result of O(nm + n d^2) achieved by using the best known pairwise algorithm running in O(m + d^2) time [Myers, 1986][Ukkonen, 1985] while potentially being more practical. It also improves on the standard practical approach which requires O(n m d) time to iteratively run an O(md) time pairwise banded alignment algorithm.
Regarding pairwise comparison, we extend the classic result of Masek and Paterson [Masek and Paterson, 1980] which computes the edit distance between two strings in O(m^2/log{m}) time to remove the dependence on psi even when edits have arbitrary costs from a penalty matrix. Crochemore, Landau, and Ziv-Ukelson [Crochemore, 2003] achieved a similar result, also allowing for unrestricted scoring matrices, but with variable-sized blocks. In practical applications of the Four Russians speedup wherein space efficiency is important and smaller block sizes k are used (notably k < |Sigma|), Kim, Na, Park, and Sim [Kim et al., 2016] showed how to remove the dependence on the alphabet size for the unit cost version, generating a lookup table in O(3^{2k} (2k)! k^2) time and O(3^{2k} (2k)! k) space. Combining their work with our result yields an improvement to O((2k)! k^2 lg{k}) time and O((2k)! k^2) space.

Brian Brubach and Jay Ghurye. A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 13:1-13:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.CPM.2018.13, author = {Brubach, Brian and Ghurye, Jay}, title = {{A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {13:1--13:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.13}, URN = {urn:nbn:de:0030-drops-86965}, doi = {10.4230/LIPIcs.CPM.2018.13}, annote = {Keywords: edit distance, banded alignment, one-against-many alignment, genomics, method of the Four Russians} }

Document

**Published in:** LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Comparing a string to a large set of sequences is a key subroutine in greedy heuristics for clustering genomic data. Clustering 16S rRNA gene sequences into operational taxonomic units (OTUs) is a common method used in studying microbial communities. We present a new approach to greedy clustering using a trie-like data structure and Four Russians speedup. We evaluate the running time of our method in terms of the number of comparisons it makes during clustering and show in experimental results that the number of comparisons grows linearly with the size of the dataset as opposed to the quadratic running time of other methods. We compare the clusters output by our method to the popular greedy clustering tool UCLUST. We show that the clusters we generate can be both tighter and larger.

Brian Brubach, Jay Ghurye, Mihai Pop, and Aravind Srinivasan. Better Greedy Sequence Clustering with Fast Banded Alignment. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.WABI.2017.3, author = {Brubach, Brian and Ghurye, Jay and Pop, Mihai and Srinivasan, Aravind}, title = {{Better Greedy Sequence Clustering with Fast Banded Alignment}}, booktitle = {17th International Workshop on Algorithms in Bioinformatics (WABI 2017)}, pages = {3:1--3:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-050-7}, ISSN = {1868-8969}, year = {2017}, volume = {88}, editor = {Schwartz, Russell and Reinert, Knut}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.3}, URN = {urn:nbn:de:0030-drops-76425}, doi = {10.4230/LIPIcs.WABI.2017.3}, annote = {Keywords: Sequence Clustering, Metagenomics, String Algorithms} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

Online matching has received significant attention over the last 15 years due to its close connection to Internet advertising. As the seminal work of Karp, Vazirani, and Vazirani has an optimal (1 - 1/epsilon) competitive ratio in the standard adversarial online model, much effort has gone into developing useful online models that incorporate some stochasticity in the arrival process. One such popular model is the "known I.I.D. model" where different customer-types arrive online from a known distribution. We develop algorithms with improved competitive ratios for some basic variants of this model with integral arrival rates, including: (a) the case of general weighted edges, where we improve the best-known ratio of 0.667 due to [Haeupler, Mirrokni and Zadimoghaddam WINE 2011] to 0.705; and (b) the vertex-weighted case, where we improve the 0.7250 ratio of [Jaillet and Lu Math. Oper. Res 2013] to 0.7299. We also consider two extensions, one is "known I.I.D." with non-integral arrival rate and stochastic rewards; the other is "known I.I.D." b-matching with non-integral arrival rate and stochastic rewards. We present a simple non-adaptive algorithm which works well simultaneously on the two extensions.
One of the key ingredients of our improvement is the following (offline) approach to bipartite-matching polytopes with additional constraints. We first add several valid constraints in order to get a good fractional solution f; however, these give us less control over the structure of f. We next remove all these additional constraints and randomly move from f to a feasible point on the matching polytope with all coordinates being from the set {0, 1/k, 2/k,..., 1} for a chosen integer k. The structure of this solution is inspired by [Jaillet and Lu Math. Oper. Res 2013] and is a tractable structure for algorithm design and analysis. The appropriate random move preserves many of the removed constraints (approximately [exactly] with high probability [in expectation]). This underlies some of our improvements, and, we hope, could be of independent interest.

Brian Brubach, Karthik Abinav Sankararaman, Aravind Srinivasan, and Pan Xu. New Algorithms, Better Bounds, and a Novel Model for Online Stochastic Matching. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 24:1-24:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.ESA.2016.24, author = {Brubach, Brian and Sankararaman, Karthik Abinav and Srinivasan, Aravind and Xu, Pan}, title = {{New Algorithms, Better Bounds, and a Novel Model for Online Stochastic Matching}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {24:1--24:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.24}, URN = {urn:nbn:de:0030-drops-63753}, doi = {10.4230/LIPIcs.ESA.2016.24}, annote = {Keywords: Ad-Allocation, Online Matching, Randomized Algorithms} }