Streaming Partitioning of Sequences and Trees

Author Christian Konrad



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2016.13.pdf
  • Filesize: 0.6 MB
  • 18 pages

Document Identifiers

Author Details

Christian Konrad

Cite AsGet BibTex

Christian Konrad. Streaming Partitioning of Sequences and Trees. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ICDT.2016.13

Abstract

We study streaming algorithms for partitioning integer sequences and trees. In the case of trees, we suppose that the input tree is provided by a stream consisting of a depth-first-traversal of the input tree. This captures the problem of partitioning XML streams, among other problems. We show that both problems admit deterministic (1+epsilon)-approximation streaming algorithms, where a single pass is sufficient for integer sequences and two passes are required for trees. The space complexity for partitioning integer sequences is O((1/epsilon) * p * log(nm)) and for partitioning trees is O((1/epsilon) * p^2 * log(nm)), where n is the length of the input stream, m is the maximal weight of an element in the stream, and p is the number of partitions to be created. Furthermore, for the problem of partitioning integer sequences, we show that computing an optimal solution in one pass requires Omega(n) space, and computing a (1+epsilon)-approximation in one pass requires Omega((1/epsilon) * log(n)) space, rendering our algorithm tight for instances with p,m in O(1).
Keywords
  • Streaming Algorithms
  • XML Documents
  • Data Partitioning
  • Communication Complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st Symposium on Principles of Database Systems, PODS'12, pages 5-14, New York, NY, USA, 2012. ACM. Google Scholar
  2. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Spectral sparsification in dynamic graph streams. In Prasad Raghavendra, Sofya Raskhodnikova, Klaus Jansen, and JoséD.P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, volume 8096 of Lecture Notes in Computer Science, pages 1-10. Springer Berlin Heidelberg, 2013. URL: http://dx.doi.org/10.1007/978-3-642-40328-6_1.
  3. Konstantin Andreev and Harald Räcke. Balanced graph partitioning. In Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA'04, pages 120-124, New York, NY, USA, 2004. ACM. URL: http://dx.doi.org/10.1145/1007912.1007931.
  4. Michael Bader. Space-Filling Curves - An Introduction with Applications in Scientific Computing. Texts in Computational Science and Engineering. Springer, 2013. Google Scholar
  5. Joshua Batson, Daniel A. Spielman, Nikhil Srivastava, and Shang-Hua Teng. Spectral sparsification of graphs: Theory and algorithms. Commun. ACM, 56(8):87-94, August 2013. Google Scholar
  6. S. H. Bokhari. Partitioning problems in parallel, pipeline, and distributed computing. IEEE Trans. Comput., 37(1):48-57, January 1988. URL: http://dx.doi.org/10.1109/12.75137.
  7. Vanessa Braganholo and Marta Mattoso. A survey on xml fragmentation. In SIGMOD'14, New York, NY, USA, 2014. ACM. Google Scholar
  8. Stefan Fafianie and Stefan Kratsch. Streaming kernelization. In Erzsébet Csuhaj-Varjú, Martin Dietzfelbinger, and Zoltán Ésik, editors, Mathematical Foundations of Computer Science 2014, volume 8635 of Lecture Notes in Computer Science, pages 275-286. Springer Berlin Heidelberg, 2014. Google Scholar
  9. Greg N. Frederickson. Optimal algorithms for tree partitioning. In SODA'91, pages 168-177, Philadelphia, PA, USA, 1991. URL: http://dl.acm.org/citation.cfm?id=127787.127822.
  10. Yijie Han, Bhagirath Narahari, and Hyeong-Ah Choi. Mapping a chain task to chained processors. Inf. Process. Lett., 44(3):141-148, 1992. URL: http://dx.doi.org/10.1016/0020-0190(92)90054-Y.
  11. Pierre Hansen and Keh-Wei Lih. Improved algorithms for partitioning problems in parallel, pipelined, and distributed computing. IEEE Trans. Comput., 1992. Google Scholar
  12. Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 561-570, 2014. Google Scholar
  13. Sanjeev Khanna, S. Muthukrishnan, and Steven Skiena. Efficient array partitioning. In ICALP, volume 1256, pages 616-626. Springer Berlin Heidelberg, 1997. URL: http://dx.doi.org/10.1007/3-540-63165-8_216.
  14. Jin Kim and Hyunchul Kang. A method of XML document fragmentation for reducing time of XML fragment stream query processing. Computing and Informatics, 31(3):639, 2012. URL: http://www.cai.sk/ojs/index.php/cai/article/view/1012.
  15. Christian Konrad. Two-constraint domain decomposition with space filling curves. Parallel Comput., 37(4-5):203-216, April 2011. URL: http://dx.doi.org/10.1016/j.parco.2011.03.002.
  16. Christian Konrad and Frédéric Magniez. Validating xml documents in the streaming model with external memory. ACM Trans. Database Syst., 38(4), 2013. URL: http://dx.doi.org/10.1145/2504590.
  17. Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997. Google Scholar
  18. Daniel Lokshtanov, Neeldhara Misra, and Saket Saurabh. Kernelization - preprocessing with a guarantee. In The Multivariate Algorithmic Revolution and Beyond - Essays Dedicated to Michael R. Fellows on the Occasion of His 60th Birthday, pages 129-161, 2012. Google Scholar
  19. F. Magniez, C. Mathieu, and A. Nayak. Recognizing well-parenthesized expressions in the streaming model. SIAM Journal on Computing, 2014. Google Scholar
  20. Fredrik Manne and Bjørn Olstad. Efficient partitioning of sequences. IEEE Trans. Comput., 44(11):1322-1326, November 1995. URL: http://dx.doi.org/10.1109/12.475128.
  21. Fredrik Manne and Tor Sørevik. Optimal partitioning of sequences. J. Algorithms, 19(2):235-249, September 1995. URL: http://dx.doi.org/10.1006/jagm.1995.1035.
  22. Serge Miguet and Jean-Marc Pierson. Heuristics for 1d rectilinear partitioning as a low cost and high quality answer to dynamic load balancing. In HPCN Europe 1997, pages 550-564, London, UK, UK, 1997. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=645561.659355.
  23. Mihai Patrascu and Liam Roditty. Distance oracles beyond the thorup-zwick bound. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS'10, pages 815-823, Washington, DC, USA, 2010. IEEE Computer Society. Google Scholar
  24. Yehoshua Perl and Stephen R. Schach. Max-min tree partitioning. J. ACM, 28(1):5-15, January 1981. URL: http://dx.doi.org/10.1145/322234.322236.
  25. Ali Pinar and Cevdet Aykanat. Fast optimal load balancing algorithms for 1d partitioning. J. Parallel Distrib. Comput., 64(8):974-996, August 2004. URL: http://dx.doi.org/10.1016/j.jpdc.2004.05.003.
  26. Stefan Schamberger and Jens-Michael Wierum. Partitioning finite element meshes using space-filling curves. Future Gener. Comput. Syst., 21(5):759-766, May 2005. URL: http://dx.doi.org/10.1016/j.future.2004.05.018.
  27. Mikkel Thorup and Uri Zwick. Approximate distance oracles. In Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing, STOC'01, pages 183-192, New York, NY, USA, 2001. ACM. Google Scholar
  28. Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In FOCS'77, pages 222-227, Washington, DC, USA, 1977. URL: http://dx.doi.org/10.1109/SFCS.1977.24.
  29. Qirun Zhang, Michael R. Lyu, Hao Yuan, and Zhendong Su. Fast algorithms for dyck-cfl-reachability with applications to alias analysis. In PLDI'13, 2013. URL: http://dx.doi.org/10.1145/2491956.2462159.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail