Scale Independence: Using Small Data to Answer Queries on Big Data (Invited Talk)

Author Floris Geerts



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2016.2.pdf
  • Filesize: 235 kB
  • 2 pages

Document Identifiers

Author Details

Floris Geerts

Cite As Get BibTex

Floris Geerts. Scale Independence: Using Small Data to Answer Queries on Big Data (Invited Talk). In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ICDT.2016.2

Abstract

Large datasets introduce challenges to the scalability of query answering. Given a query Q and a dataset D, it is often prohibitively costly to compute the query answers Q(D) when D is big. To this end, one may want to use heuristics, "quick and dirty" algorithms which return approximate answers. However, in many applications it is a must to find exact query answers. So, how can we efficiently compute Q(D) when D is big  or  when we only have limited  resources? 

One idea is to find a small subset D_Q of D such that Q(D_Q)=Q(D) where the size of D_Q is independent of the size of the underlying dataset D. Intuitively, when such a D_Q can be found for a query Q, the query is said to be  scale independent (Armbrust et al. 2011, Armbrust et al. 2013, Fan et al. 2014). Indeed, for answering such queries the size of the underlying database does not matter, i.e., query processing is independent of the scale of the database.

In this talk, I will survey various formalisms that enable large classes of queries to be scale independent. These formalisms primarily rely on the availability of  access constraints, a combination of indexes and cardinality constraints, on the data (Fan et al. 15, Fan et al. 14). We will take a closer look at how, in the presence of such constraints, queries can often be compiled into efficient query  plans that access a bounded  amount data (Cao et al. 2014, Fan et al. 2015), and how these techniques relate to query processing  in the presence of access patterns (Benedikt et al. 2015, Benedikt et al. 2014, Deutsch et al. 2007).  Finally, we illustrate that  scale independent queries are quite common in practice and that they indeed can be  efficiently answered on big datasets when access constraints are present (Cao et al. 2015, Cao et al. 2014).

Subject Classification

Keywords
  • Scale independence
  • Access constraints
  • Query processing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, and David A. Patterson. PIQL: Success-tolerant query processing in the cloud. PVLDB, 5(3):181-192, 2011. Google Scholar
  2. Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael J. Franklin, and David A. Patterson. Generalized scale independence through incremental precomputation. In Proc SIGMOD 2013, pages 625-636, 2013. Google Scholar
  3. Michael Benedikt, Julien Leblay, and Efthymia Tsamoura. Querying with access patterns and integrity constraints. PVLDB, 8(6):690-701, 2015. Google Scholar
  4. Michael Benedikt, Balder ten Cate, and Efthymia Tsamoura. Generating low-cost plans from proofs. In Proc. PODS 2014, pages 200-211, 2014. Google Scholar
  5. Yang Cao, Wenfei Fan, Jinpeng Huai, and Ruizhe Huang. Making pattern queries bounded in big graphs. In Proc. ICDE 2015, pages 161-172, 2015. Google Scholar
  6. Yang Cao, Wenfei Fan, Tianyu Wo, and Wenyuan Yu. Bounded conjunctive queries. PVLDB, 7(12):1231-1242, 2014. Google Scholar
  7. Alin Deutsch, Bertram Ludäscher, and Alan Nash. Rewriting queries using views with access patterns under integrity constraints. TCS, 371(3):200-226, 2007. Google Scholar
  8. Wenfei Fan, Floris Geerts, Yang Cao, Ting Deng, and Ping Lu. Querying big data by accessing small data. In Proc. PODS 2015, pages 173-184, 2015. Google Scholar
  9. Wenfei Fan, Floris Geerts, and Leonid Libkin. On scale independence for querying big data. In Proc. PODS 2014, pages 51-62, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail