Top-k Querying of Unknown Values under Order Constraints

Authors Antoine Amarilli, Yael Amsterdamer, Tova Milo, Pierre Senellart



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2017.5.pdf
  • Filesize: 0.63 MB
  • 18 pages

Document Identifiers

Author Details

Antoine Amarilli
Yael Amsterdamer
Tova Milo
Pierre Senellart

Cite AsGet BibTex

Antoine Amarilli, Yael Amsterdamer, Tova Milo, and Pierre Senellart. Top-k Querying of Unknown Values under Order Constraints. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ICDT.2017.5

Abstract

Many practical scenarios make it necessary to evaluate top-k queries over data items with partially unknown values. This paper considers a setting where the values are taken from a numerical domain, and where some partial order constraints are given over known and unknown values: under these constraints, we assume that all possible worlds are equally likely. Our work is the first to propose a principled scheme to derive the value distributions and expected values of unknown items in this setting, with the goal of computing estimated top-k results by interpolating the unknown values from the known ones. We study the complexity of this general task, and show tight complexity bounds, proving that the problem is intractable, but can be tractably approximated. We then consider the case of tree-shaped partial orders, where we show a constructive PTIME solution. We also compare our problem setting to other top-k definitions on uncertain data.
Keywords
  • uncertainty
  • partial order
  • unknown values
  • crowdsourcing
  • interpolation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, T-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Capturing continuous data and answering aggregate queries in probabilistic XML. TODS, 36(4), 2011. Google Scholar
  2. ACM Computing Classification System, 2012. URL: https://www.acm.org/about/class/class/2012.
  3. Antoine Amarilli, Yael Amsterdamer, and Tova Milo. On the complexity of mining itemsets from the crowd using taxonomies. In ICDT, 2014. URL: http://dx.doi.org/10.5441/002/icdt.2014.06.
  4. Antoine Amarilli, Yael Amsterdamer, and Tova Milo. Uncertainty in crowd data sourcing under structural constraints. In UnCrowd, 2014. Google Scholar
  5. Antoine Amarilli, Yael Amsterdamer, Tova Milo, and Pierre Senellart. Top-k querying of unknown values under order constraints (extended version). CoRR, abs/1701.02634, 2017. Google Scholar
  6. Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. Provenance circuits for trees and treelike instances. In ICALP, 2015. URL: http://dx.doi.org/10.1007/978-3-662-47666-6_5.
  7. Dimitris Bertsimas and Santosh Vempala. Solving convex programs by random walks. JACM, 51(4), 2004. Google Scholar
  8. Christopher M. Bishop. Graphical models. In Pattern Recognition and Machine Learning, chapter 8. Springer, 2006. Google Scholar
  9. Jonathan Bragg, Mausam, and Daniel S. Weld. Crowdsourcing multi-label classification for taxonomy creation. In HCOMP, 2013. Google Scholar
  10. Graham Brightwell and Peter Winkler. Counting linear extensions. Order, 8(3), 1991. Google Scholar
  11. Benno Büeler, Andreas Enge, and Komei Fukuda. Exact volume computation for polytopes: a practical study. In Polytopes - combinatorics and Computation, 2000. Google Scholar
  12. Reynold Cheng, Dmitri V Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003. Google Scholar
  13. Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. Crowdsourcing for top-k query processing over uncertain data. IEEE TKDE, 28(1), 2016. Google Scholar
  14. Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Running tree automata on probabilistic XML. In PODS, 2009. Google Scholar
  15. Graham Cormode, Feifei Li, and Ke Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009. Google Scholar
  16. Ben Cousins and Santosh Vempala. A practical volume algorithm. Mathematical Programming Computation, 8(2), 2016. Google Scholar
  17. Susan B. Davidson, Sanjeev Khanna, Tova Milo, and Sudeepa Roy. Using the crowd for top-k and group-by queries. In ICDT, 2013. Google Scholar
  18. Jesús A De Loera, B Dutra, Matthias Köppe, S Moreinis, G Pinto, and J Wu. Software for exact integration of polynomials over polyhedra. Computational Geometry, 46(3), 2013. Google Scholar
  19. Landon Detwiler, Wolfgang Gatterbauer, Brenton Louie, Dan Suciu, and Peter Tarczy-Hornoch. Integrating and ranking uncertain scientific data. In ICDE, 2009. Google Scholar
  20. Ulrich Faigle, Laszlo Lovasz, Rainer Schrader, and Gy Turán. Searching in trees, series-parallel and interval orders. SIAM J. Comput., 15(4), 1986. Google Scholar
  21. Cunjing Ge and Feifei Ma. A fast and practical method to estimate volumes of convex polytopes. In FAW, 2015. Google Scholar
  22. James E. Gentle. Computational Statistics. Springer, 2009. Google Scholar
  23. Google Product Taxonomy, 2016. URL: https://support.google.com/merchants/answer/1705911?hl=en.
  24. Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharma. Discovering all most specific sentences. TODS, 28(2), 2003. Google Scholar
  25. Parisa Haghani, Sebastian Michel, and Karl Aberer. Evaluating top-k queries over incomplete data streams. In CIKM, 2009. Google Scholar
  26. Ming Hua, Jian Pei, and Xuemin Lin. Ranking queries on uncertain data. VLDB J., 20(1), 2011. Google Scholar
  27. Yannis E. Ioannidis and Raghu Ramakrishnan. Efficient transitive closure algorithms. In VLDB, 1988. Google Scholar
  28. Jeffrey Jestes, Graham Cormode, Feifei Li, and Ke Yi. Semantics of ranking queries for probabilistic data. IEEE TKDE, 23(12), 2011. Google Scholar
  29. Frank Jones. Lebesgue Integration on Euclidean Space. Jones &Bartlett Learning, 2001. Google Scholar
  30. Kustaa Kangas, Teemu Hankala, Teppo Niinimäki, and Mikko Koivisto. Counting linear extensions of sparse posets. In IJCAI, 2016. Google Scholar
  31. Ravi Kannan, László Lovász, and Miklós Simonovits. Random walks and an O^*(n⁵) volume algorithm for convex bodies. Random Struct. Algorithms, 11(1), 1997. Google Scholar
  32. Jim Lawrence. Polytope volume computation. Mathematics of Computation, 57(195), 1991. Google Scholar
  33. Jian Li, Barna Saha, and Amol Deshpande. A unified approach to ranking in probabilistic databases. PVLDB, 2(1), 2009. Google Scholar
  34. Xiang Lian and Lei Chen. Probabilistic ranked queries in uncertain databases. In EDBT, 2008. Google Scholar
  35. Xiang Lian and Lei Chen. A generic framework for handling uncertain data with local correlations. VLDB, 4(1), 2010. Google Scholar
  36. László Lovász and István Deák. Computational results of an o^*(n⁴) volume algorithm. European J. Operational Research, 216(1), 2012. Google Scholar
  37. László Lovász and Santosh Vempala. Hit-and-run from a corner. SIAM J. Comput., 35(4), 2006. Google Scholar
  38. Frederic Maire. An algorithm for the exact computation of the centroid of higher dimensional polyhedra and its application to kernel machines. In ICDM, 2003. Google Scholar
  39. Christos H Papadimitriou. Efficient search for rationals. Information Processing Letters, 8(1), 1979. Google Scholar
  40. A. Parameswaran, A.D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it’s okay to ask questions. PVLDB, 4(5), 2011. Google Scholar
  41. Gara Pruesse and Frank Ruskey. Generating linear extensions fast. SIAM J. Comput., 23(2), 1994. Google Scholar
  42. Luis A Rademacher. Approximating the centroid is hard. In SCG, 2007. Google Scholar
  43. Christopher Re, Nilesh N. Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007. Google Scholar
  44. Alexander Schrijver. The structure of polyhedra. In Theory of Linear and Integer Programming, chapter 8. Wiley-Interscience, 1986. Google Scholar
  45. Miklós Simonovits. How to compute the volume in high dimension? Mathematical programming, 97(1-2), 2003. Google Scholar
  46. Mohamed A. Soliman, Ihab F. Ilyas, and Shalev Ben-David. Supporting ranking queries on uncertain and incomplete data. VLDB J., 19(4), 2010. Google Scholar
  47. Mohamed A. Soliman, Ihab F. Ilyas, and K. Chen-Chuan Chang. Top-k query processing in uncertain databases. In ICDE, 2007. Google Scholar
  48. Julia Stoyanovich, Sihem Amer-Yahia, Susan B Davidson, Marie Jacob, Tova Milo, et al. Understanding local structure in ranked datasets. In CIDR, 2013. Google Scholar
  49. Chong Sun, Narasimhan Rampalli, Frank Yang, and AnHai Doan. Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. PVLDB, 7(13), 2014. Google Scholar
  50. Chonghai Wang, Li-Yan Yuan, Jia-Huai You, Osmar R. Zaïane, and Jian Pei. On pruning for top-k ranking in uncertain databases. PVLDB, 4(10), 2011. Google Scholar
  51. Ke Yi, Feifei Li, George Kollios, and Divesh Srivastava. Efficient processing of top-k queries in uncertain databases. In ICDE, 2008. Google Scholar
  52. Xi Zhang and Jan Chomicki. Semantics and evaluation of top-k queries in probabilistic databases. DAPD, 26(1), 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail