Approximating Distance Measures for the Skyline

Authors Nirman Kumar, Benjamin Raichel, Stavros Sintos, Gregory Van Buskirk



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2019.10.pdf
  • Filesize: 0.72 MB
  • 20 pages

Document Identifiers

Author Details

Nirman Kumar
  • Department of Computer Science, University of Memphis, TN, USA
Benjamin Raichel
  • Department of Computer Science, University of Texas at Dallas, TX, USA
Stavros Sintos
  • Department of Computer Science, Duke University, Durham, NC, USA
Gregory Van Buskirk
  • Department of Computer Science, University of Texas at Dallas, TX, USA

Cite As Get BibTex

Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk. Approximating Distance Measures for the Skyline. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.ICDT.2019.10

Abstract

In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. 
For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • Skyline
  • Pareto optimal
  • Approximation
  • Hardness
  • Multi-criteria decision making

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal. Range Searching. In Handbook of Discrete and Computational Geometry, Second Edition., pages 809-837. CRC, 2004. URL: http://dx.doi.org/10.1201/9781420035315.ch36.
  2. Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1-30, 2005. Google Scholar
  3. Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, and Subhash Suri. Efficient Algorithms for k-Regret Minimizing Sets. In 16th International Symposium on Experimental Algorithms, SEA 2017, June 21-23, 2017, London, UK, pages 7:1-7:23, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.SEA.2017.7.
  4. Mugurel Ionut Andreica, Eliana-Dina Tirsa, Cristina Teodora Andreica, Romulus Andreica, and Mihai Aristotel Ungureanu. Optimal geometric partitions, covers and K-centers. arXiv preprint arXiv:0908.3652, 2009. Google Scholar
  5. Aaron Archer. Two O (log^* k)-Approximation Algorithms for the Asymmetric k-Center Problem. In Integer Programming and Combinatorial Optimization, 8th International IPCO Conference, Utrecht, The Netherlands, June 13-15, 2001, Proceedings, pages 1-14, 2001. URL: http://dx.doi.org/10.1007/3-540-45535-3_1.
  6. Abolfazl Asudeh, Azade Nazi, Nan Zhang, and Gautam Das. Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pages 821-834, 2017. URL: http://dx.doi.org/10.1145/3035918.3035932.
  7. Avrim Blum, Sariel Har-Peled, and Benjamin Raichel. Sparse Approximation via Generating Point Sets. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 548-557, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch40.
  8. Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. The Skyline Operator. In Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany, pages 421-430, 2001. URL: http://dx.doi.org/10.1109/ICDE.2001.914855.
  9. Greg Van Buskirk, Benjamin Raichel, and Nicholas Ruozzi. Sparse Approximate Conic Hulls. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 2531-2541, 2017. URL: http://papers.nips.cc/paper/6847-sparse-approximate-conic-hulls.
  10. Paul B. Callahan and S. Rao Kosaraju. A Decomposition of Multidimensional Point Sets with Applications to k-Nearest-Neighbors and n-Body Potential Fields. J. ACM, 42(1):67-90, 1995. URL: http://dx.doi.org/10.1145/200836.200853.
  11. Wei Cao, Jian Li, Haitao Wang, Kangning Wang, Ruosong Wang, Raymond Chi-Wing Wong, and Wei Zhan. k-Regret Minimizing Set: Efficient Algorithms and Hardness. In 20th International Conference on Database Theory, ICDT 2017, March 21-24, 2017, Venice, Italy, pages 11:1-11:19, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.ICDT.2017.11.
  12. Timothy M. Chan, Kasper Green Larsen, and Mihai Patrascu. Orthogonal range searching on the RAM, revisited. In Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13-15, 2011, pages 1-10, 2011. URL: http://dx.doi.org/10.1145/1998196.1998198.
  13. Sean Chester, Alex Thomo, S. Venkatesh, and Sue Whitesides. Computing k-Regret Minimizing Sets. PVLDB, 7(5):389-400, 2014. URL: http://dx.doi.org/10.14778/2732269.2732275.
  14. Julia Chuzhoy, Sudipto Guha, Eran Halperin, Sanjeev Khanna, Guy Kortsarz, Robert Krauthgamer, and Joseph Naor. Asymmetric k-center is log^* n-hard to approximate. J. ACM, 52(4):538-551, 2005. URL: http://dx.doi.org/10.1145/1082036.1082038.
  15. Vasek Chvátal. A Greedy Heuristic for the Set-Covering Problem. Math. Oper. Res., 4(3):233-235, 1979. URL: http://dx.doi.org/10.1287/moor.4.3.233.
  16. Tomás Feder and Daniel H. Greene. Optimal Algorithms for Approximate Clustering. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 434-444, 1988. URL: http://dx.doi.org/10.1145/62212.62255.
  17. Harold N. Gabow, Jon Louis Bentley, and Robert Endre Tarjan. Scaling and Related Techniques for Geometry Problems. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing, April 30 - May 2, 1984, Washington, DC, USA, pages 135-143, 1984. URL: http://dx.doi.org/10.1145/800057.808675.
  18. Sariel Har-Peled. Geometric approximation algorithms, volume 173. American mathematical society Boston, 2011. Google Scholar
  19. Sariel Har-Peled and Manor Mendel. Fast Construction of Nets in Low-Dimensional Metrics and Their Applications. SIAM J. Comput., 35(5):1148-1184, 2006. URL: http://dx.doi.org/10.1137/S0097539704446281.
  20. Refael Hassin and Arie Tamir. Improved complexity bounds for location problems on the real line. Operations Research Letters, 10(7):395-402, 1991. Google Scholar
  21. Vladlen Koltun and Christos H. Papadimitriou. Approximately Dominating Representatives. In Database Theory - ICDT 2005, 10th International Conference, Edinburgh, UK, January 5-7, 2005, Proceedings, pages 204-214, 2005. URL: http://dx.doi.org/10.1007/978-3-540-30570-5_14.
  22. Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk. Approximating Distance Measures for the Skyline. URL: http://utdallas.edu/~benjamin.raichel/stair.pdf.
  23. Nirman Kumar and Stavros Sintos. Faster Approximation Algorithm for the k-Regret Minimizing Set and Related Problems. In Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments, ALENEX 2018, New Orleans, LA, USA, January 7-8, 2018., pages 62-74, 2018. URL: http://dx.doi.org/10.1137/1.9781611975055.6.
  24. H. T. Kung, Fabrizio Luccio, and Franco P. Preparata. On Finding the Maxima of a Set of Vectors. J. ACM, 22(4):469-476, 1975. URL: http://dx.doi.org/10.1145/321906.321910.
  25. Xuemin Lin, Yidong Yuan, Qing Zhang, and Ying Zhang. Selecting Stars: The k Most Representative Skyline Operator. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, pages 86-95, 2007. URL: http://dx.doi.org/10.1109/ICDE.2007.367854.
  26. Matteo Magnani, Ira Assent, and Michael L. Mortensen. Taking the Big Picture: representative skylines based on significance and diversity. VLDB J., 23(5):795-815, 2014. URL: http://dx.doi.org/10.1007/s00778-014-0352-3.
  27. S. Mentzer. Approximability of Metric Clustering Problems. preprint on researchgate.net, 2016. URL: https://www.researchgate.net/publication/242489373_Approximability_of_Metric_Clustering_Problems.
  28. Danupon Nanongkai, Atish Das Sarma, Ashwin Lall, Richard J. Lipton, and Jun (Jim) Xu. Regret-Minimizing Representative Databases. PVLDB, 3(1):1114-1124, 2010. URL: http://dx.doi.org/10.14778/1920841.1920980.
  29. Giri Narasimhan and Michiel H. M. Smid. Geometric spanner networks. Cambridge University Press, 2007. Google Scholar
  30. Rina Panigrahy and Sundar Vishwanathan. An O (log^* n) Approximation Algorithm for the Asymmetric p-Center Problem. J. Algo., 27(2):259-268, 1998. URL: http://dx.doi.org/10.1006/jagm.1997.0921.
  31. J. Salowe. Selection Problems in Computational Geometry. PhD thesis, Rutgers University, 1987. Google Scholar
  32. Jeffrey S. Salowe. L-Infinity Interdistance Selection by Parametric Search. Inf. Process. Lett., 30(1):9-14, 1989. URL: http://dx.doi.org/10.1016/0020-0190(89)90166-X.
  33. Malene Søholm, Sean Chester, and Ira Assent. Maximum Coverage Representative Skyline. In Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15-16, 2016, Bordeaux, France, March 15-16, 2016., pages 702-703, 2016. URL: http://dx.doi.org/10.5441/002/edbt.2016.95.
  34. Yufei Tao, Ling Ding, Xuemin Lin, and Jian Pei. Distance-Based Representative Skyline. In Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, pages 892-903, 2009. URL: http://dx.doi.org/10.1109/ICDE.2009.84.
  35. Stan PM Van Hoesel and Albert PM Wagelmans. On the p-coverage problem on the real line. Statistica Neerlandica, 61(1):16-34, 2007. Google Scholar
  36. Rui Xu and Donald C. Wunsch II. Survey of clustering algorithms. IEEE Trans. Neural Networks, 16(3):645-678, 2005. URL: http://dx.doi.org/10.1109/TNN.2005.845141.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail