Scale Independence: Using Small Data to Answer Queries on Big Data (Invited Talk)

Geerts, Floris

doi:10.4230/LIPIcs.ICDT.2016.2

File

LIPIcs.ICDT.2016.2.pdf

Filesize: 235 kB
2 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ICDT.2016.2
URN: urn:nbn:de:0030-drops-57715

Author Details

Floris Geerts

Cite AsGet BibTex

Floris Geerts. Scale Independence: Using Small Data to Answer Queries on Big Data (Invited Talk). In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ICDT.2016.2

Abstract

Large datasets introduce challenges to the scalability of query answering. Given a query Q and a dataset D, it is often prohibitively costly to compute the query answers Q(D) when D is big. To this end, one may want to use heuristics, "quick and dirty" algorithms which return approximate answers. However, in many applications it is a must to find exact query answers. So, how can we efficiently compute Q(D) when D is big or when we only have limited resources? One idea is to find a small subset D_Q of D such that Q(D_Q)=Q(D) where the size of D_Q is independent of the size of the underlying dataset D. Intuitively, when such a D_Q can be found for a query Q, the query is said to be scale independent (Armbrust et al. 2011, Armbrust et al. 2013, Fan et al. 2014). Indeed, for answering such queries the size of the underlying database does not matter, i.e., query processing is independent of the scale of the database. In this talk, I will survey various formalisms that enable large classes of queries to be scale independent. These formalisms primarily rely on the availability of access constraints, a combination of indexes and cardinality constraints, on the data (Fan et al. 15, Fan et al. 14). We will take a closer look at how, in the presence of such constraints, queries can often be compiled into efficient query plans that access a bounded amount data (Cao et al. 2014, Fan et al. 2015), and how these techniques relate to query processing in the presence of access patterns (Benedikt et al. 2015, Benedikt et al. 2014, Deutsch et al. 2007). Finally, we illustrate that scale independent queries are quite common in practice and that they indeed can be efficiently answered on big datasets when access constraints are present (Cao et al. 2015, Cao et al. 2014).

Keywords

Scale independence
Access constraints
Query processing

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Michael Armbrust, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, and David A. Patterson. PIQL: Success-tolerant query processing in the cloud. PVLDB, 5(3):181-192, 2011.
Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael J. Franklin, and David A. Patterson. Generalized scale independence through incremental precomputation. In Proc SIGMOD 2013, pages 625-636, 2013.
Michael Benedikt, Julien Leblay, and Efthymia Tsamoura. Querying with access patterns and integrity constraints. PVLDB, 8(6):690-701, 2015.
Michael Benedikt, Balder ten Cate, and Efthymia Tsamoura. Generating low-cost plans from proofs. In Proc. PODS 2014, pages 200-211, 2014.
Yang Cao, Wenfei Fan, Jinpeng Huai, and Ruizhe Huang. Making pattern queries bounded in big graphs. In Proc. ICDE 2015, pages 161-172, 2015.
Yang Cao, Wenfei Fan, Tianyu Wo, and Wenyuan Yu. Bounded conjunctive queries. PVLDB, 7(12):1231-1242, 2014.
Alin Deutsch, Bertram Ludäscher, and Alan Nash. Rewriting queries using views with access patterns under integrity constraints. TCS, 371(3):200-226, 2007.
Wenfei Fan, Floris Geerts, Yang Cao, Ting Deng, and Ping Lu. Querying big data by accessing small data. In Proc. PODS 2015, pages 173-184, 2015.
Wenfei Fan, Floris Geerts, and Leonid Libkin. On scale independence for querying big data. In Proc. PODS 2014, pages 51-62, 2014.