Top-k Querying of Unknown Values under Order Constraints

Amarilli, Antoine; Amsterdamer, Yael; Milo, Tova; Senellart, Pierre

doi:10.4230/LIPIcs.ICDT.2017.5

Abstract

Many practical scenarios make it necessary to evaluate top-k queries over data items with partially unknown values. This paper considers a setting where the values are taken from a numerical domain, and where some partial order constraints are given over known and unknown values: under these constraints, we assume that all possible worlds are equally likely.
Our work is the first to propose a principled scheme to derive the value distributions and expected values of unknown items in this setting, with the goal of computing estimated top-k results by interpolating the unknown values from the known ones. We study the complexity of this general task, and show tight complexity bounds, proving that the problem is intractable, but
can be tractably approximated. We then consider the case of tree-shaped partial orders, where we show a constructive PTIME solution. We also compare our problem setting to other top-k definitions on uncertain data.

Serge Abiteboul, T-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Capturing continuous data and answering aggregate queries in probabilistic XML. TODS, 36(4), 2011.
ACM Computing Classification System, 2012. URL: https://www.acm.org/about/class/class/2012.
Antoine Amarilli, Yael Amsterdamer, and Tova Milo. On the complexity of mining itemsets from the crowd using taxonomies. In ICDT, 2014. URL: http://dx.doi.org/10.5441/002/icdt.2014.06.
Antoine Amarilli, Yael Amsterdamer, and Tova Milo. Uncertainty in crowd data sourcing under structural constraints. In UnCrowd, 2014.
Antoine Amarilli, Yael Amsterdamer, Tova Milo, and Pierre Senellart. Top-k querying of unknown values under order constraints (extended version). CoRR, abs/1701.02634, 2017.
Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. Provenance circuits for trees and treelike instances. In ICALP, 2015. URL: http://dx.doi.org/10.1007/978-3-662-47666-6_5.
Dimitris Bertsimas and Santosh Vempala. Solving convex programs by random walks. JACM, 51(4), 2004.
Christopher M. Bishop. Graphical models. In Pattern Recognition and Machine Learning, chapter 8. Springer, 2006.
Jonathan Bragg, Mausam, and Daniel S. Weld. Crowdsourcing multi-label classification for taxonomy creation. In HCOMP, 2013.
Graham Brightwell and Peter Winkler. Counting linear extensions. Order, 8(3), 1991.
Benno Büeler, Andreas Enge, and Komei Fukuda. Exact volume computation for polytopes: a practical study. In Polytopes - combinatorics and Computation, 2000.
Reynold Cheng, Dmitri V Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003.
Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. Crowdsourcing for top-k query processing over uncertain data. IEEE TKDE, 28(1), 2016.
Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Running tree automata on probabilistic XML. In PODS, 2009.
Graham Cormode, Feifei Li, and Ke Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009.
Ben Cousins and Santosh Vempala. A practical volume algorithm. Mathematical Programming Computation, 8(2), 2016.
Susan B. Davidson, Sanjeev Khanna, Tova Milo, and Sudeepa Roy. Using the crowd for top-k and group-by queries. In ICDT, 2013.
Jesús A De Loera, B Dutra, Matthias Köppe, S Moreinis, G Pinto, and J Wu. Software for exact integration of polynomials over polyhedra. Computational Geometry, 46(3), 2013.
Landon Detwiler, Wolfgang Gatterbauer, Brenton Louie, Dan Suciu, and Peter Tarczy-Hornoch. Integrating and ranking uncertain scientific data. In ICDE, 2009.
Ulrich Faigle, Laszlo Lovasz, Rainer Schrader, and Gy Turán. Searching in trees, series-parallel and interval orders. SIAM J. Comput., 15(4), 1986.
Cunjing Ge and Feifei Ma. A fast and practical method to estimate volumes of convex polytopes. In FAW, 2015.
James E. Gentle. Computational Statistics. Springer, 2009.
Google Product Taxonomy, 2016. URL: https://support.google.com/merchants/answer/1705911?hl=en.
Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharma. Discovering all most specific sentences. TODS, 28(2), 2003.
Parisa Haghani, Sebastian Michel, and Karl Aberer. Evaluating top-k queries over incomplete data streams. In CIKM, 2009.
Ming Hua, Jian Pei, and Xuemin Lin. Ranking queries on uncertain data. VLDB J., 20(1), 2011.
Yannis E. Ioannidis and Raghu Ramakrishnan. Efficient transitive closure algorithms. In VLDB, 1988.
Jeffrey Jestes, Graham Cormode, Feifei Li, and Ke Yi. Semantics of ranking queries for probabilistic data. IEEE TKDE, 23(12), 2011.
Frank Jones. Lebesgue Integration on Euclidean Space. Jones &Bartlett Learning, 2001.
Kustaa Kangas, Teemu Hankala, Teppo Niinimäki, and Mikko Koivisto. Counting linear extensions of sparse posets. In IJCAI, 2016.
Ravi Kannan, László Lovász, and Miklós Simonovits. Random walks and an O^*(n⁵) volume algorithm for convex bodies. Random Struct. Algorithms, 11(1), 1997.
Jim Lawrence. Polytope volume computation. Mathematics of Computation, 57(195), 1991.
Jian Li, Barna Saha, and Amol Deshpande. A unified approach to ranking in probabilistic databases. PVLDB, 2(1), 2009.
Xiang Lian and Lei Chen. Probabilistic ranked queries in uncertain databases. In EDBT, 2008.
Xiang Lian and Lei Chen. A generic framework for handling uncertain data with local correlations. VLDB, 4(1), 2010.
László Lovász and István Deák. Computational results of an o^*(n⁴) volume algorithm. European J. Operational Research, 216(1), 2012.
László Lovász and Santosh Vempala. Hit-and-run from a corner. SIAM J. Comput., 35(4), 2006.
Frederic Maire. An algorithm for the exact computation of the centroid of higher dimensional polyhedra and its application to kernel machines. In ICDM, 2003.
Christos H Papadimitriou. Efficient search for rationals. Information Processing Letters, 8(1), 1979.
A. Parameswaran, A.D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it’s okay to ask questions. PVLDB, 4(5), 2011.
Gara Pruesse and Frank Ruskey. Generating linear extensions fast. SIAM J. Comput., 23(2), 1994.
Luis A Rademacher. Approximating the centroid is hard. In SCG, 2007.
Christopher Re, Nilesh N. Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.
Alexander Schrijver. The structure of polyhedra. In Theory of Linear and Integer Programming, chapter 8. Wiley-Interscience, 1986.
Miklós Simonovits. How to compute the volume in high dimension? Mathematical programming, 97(1-2), 2003.
Mohamed A. Soliman, Ihab F. Ilyas, and Shalev Ben-David. Supporting ranking queries on uncertain and incomplete data. VLDB J., 19(4), 2010.
Mohamed A. Soliman, Ihab F. Ilyas, and K. Chen-Chuan Chang. Top-k query processing in uncertain databases. In ICDE, 2007.
Julia Stoyanovich, Sihem Amer-Yahia, Susan B Davidson, Marie Jacob, Tova Milo, et al. Understanding local structure in ranked datasets. In CIDR, 2013.
Chong Sun, Narasimhan Rampalli, Frank Yang, and AnHai Doan. Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. PVLDB, 7(13), 2014.
Chonghai Wang, Li-Yan Yuan, Jia-Huai You, Osmar R. Zaïane, and Jian Pei. On pruning for top-k ranking in uncertain databases. PVLDB, 4(10), 2011.
Ke Yi, Feifei Li, George Kollios, and Divesh Srivastava. Efficient processing of top-k queries in uncertain databases. In ICDE, 2008.
Xi Zhang and Jan Chomicki. Semantics and evaluation of top-k queries in probabilistic databases. DAPD, 26(1), 2009.

Top-k Querying of Unknown Values under Order Constraints

Authors Antoine Amarilli, Yael Amsterdamer, Tova Milo, Pierre Senellart

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message