Approximating Hit Rate Curves using Streaming Algorithms

Drudi, Zachary; Harvey, Nicholas J. A.; Ingram, Stephen; Warfield, Andrew; Wires, Jake

doi:10.4230/LIPIcs.APPROX-RANDOM.2015.225

File

Author Details

Zachary Drudi

Nicholas J. A. Harvey

Stephen Ingram

Andrew Warfield

Jake Wires

Cite AsGet BibTex

Zachary Drudi, Nicholas J. A. Harvey, Stephen Ingram, Andrew Warfield, and Jake Wires. Approximating Hit Rate Curves using Streaming Algorithms. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 40, pp. 225-241, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2015.225

Abstract

A hit rate curve is a function that maps cache size to the proportion of requests that can be served from the cache. (The caching policy and sequence of requests are assumed to be fixed.) Hit rate curves have been studied for decades in the operating system, database and computer architecture communities. They are useful tools for designing appropriate cache sizes, dynamically allocating memory between competing caches, and for summarizing locality properties of the request sequence. In this paper we focus on the widely-used LRU caching policy. Computing hit rate curves is very efficient from a runtime standpoint, but existing algorithms are not efficient in their space usage. For a stream of m requests for n cacheable objects, all existing algorithms that provably compute the hit rate curve use space linear in n. In the context of modern storage systems, n can easily be in the billions or trillions, so the space usage of these algorithms makes them impractical. We present the first algorithm for provably approximating hit rate curves for the LRU policy with sublinear space. Our algorithm uses O( p^2 * log(n) * log^2(m) / epsilon^2 ) bits of space and approximates the hit rate curve at p uniformly-spaced points to within additive error epsilon. This is not far from optimal. Any single-pass algorithm with the same guarantees must use Omega(p^2 + epsilon^{-2} + log(n)) bits of space. Furthermore, our use of additive error is necessary. Any single-pass algorithm achieving multiplicative error requires Omega(n) bits of space.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

George S. Almási, Călin Caşcaval, and David A. Padua. Calculating stack distances efficiently. In Proceedings of the 2002 workshop on memory system performance (MSP'02), pages 37-43, 2002.
Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29. ACM, 1996.
Ziv Bar-Yossef, TS Jayram, Ravi Kumar, D Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In Randomization and Approximation Techniques in Computer Science, pages 1-10. Springer, 2002.
L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78-101, 1966.
Brian T Bennett and Vincent J. Kruskal. LRU stack processing. IBM Journal of Research and Development, 19(4):353-357, 1975.
Hjortur Bjornsson, Gregory Chockler, Trausti Saemundsson, and Ymir Vigfusson. Dynamic performance profiling of cloud caches. In Proceedings of the 4th annual Symposium on Cloud Computing (SoCC). ACM, 2013.
Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In Foundations of Computer Science, 2007. Proceedings. 48th Annual IEEE Symposium on, pages 283-293. IEEE, 2007.
Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM Journal on Computing, 41(5):1299-1317, 2012.
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM J. Comput., 31(6):1794-1813, 2002.
Chen Ding and Yutao Zhong. Predicting whole-program locality through reuse distance analysis. In PLDI, pages 245-257. ACM, 2003.
Zachary Drudi. A streaming algorithms approach to approximating hit rate curves. Master’s thesis, University of British Columbia, 2014.
Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities. In Algorithms-ESA 2003, pages 605-617. Springer, 2003.
David Eklov and Erik Hagersten. StatStack: Efficient modeling of LRU caches. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 55-65. IEEE, 2010.
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. DMTCS Proceedings, 0(1), 2008.
Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi. Tracking set-expression cardinalities over continuous update streams. The VLDB Journal, 13(4):354-369, 2004.
Stephen T. Jones, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. In ASPLOS, pages 14-24. ACM, 2006.
Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 41-52. ACM, 2010.
E Kushilevitz and N Nisan. Communication complexity, 1997.
Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78-117, 1970.
Nimrod Megiddo and Dharmendra S Modha. ARC: A self-tuning, low overhead replacement cache. In FAST, volume 3, pages 115-130, 2003.
Qingpeng Niu, James Dinan, Qingda Lu, and P Sadayappan. Parda: A fast parallel reuse distance analysis algorithm. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 1284-1294. IEEE, 2012.
Frank Olken. Efficient methods for calculating the success function of fixed space replacement policies. Master’s thesis, University of California, Berkeley, 1981.
Alexander A. Razborov. On the distributional complexity of disjointness. Theoretical Computer Science, 106(2):385-390, 1992.
Xipeng Shen, Yutao Zhong, and Chen Ding. Locality phase prediction. In ASPLOS, pages 165-176. ACM, 2004.
Alan Jay Smith. Two methods for the efficient analysis of memory address trace data. Software Engineering, IEEE Transactions on, 3(1):94-101, 1977.
Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, and Cristiana Amza. Dynamic resource allocation for database servers running on virtual storage. In FAST. USENIX, 2009.
Harold S Stone, John Turek, and Joel L. Wolf. Optimal partitioning of cache memory. Computers, IEEE Transactions on, 41(9):1054-1068, 1992.
Carl A. Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad. Efficient MRC construction with SHARDS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15), pages 95-110. USENIX, 2015.
Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas J. A. Harvey, and Andrew Warfield. Characterizing storage workloads with counter stacks. In OSDI, 2014.
Ting Yang, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss. CRAMM: Virtual memory support for garbage-collected applications. In OSDI, pages 103-116. ACM, 2006.
Yutao Zhong, Maksim Orlovich, Xipeng Shen, and Chen Ding. Array regrouping and structure splitting using whole-program reference affinity. In PLDI, pages 255-266. ACM, 2004.
Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou, and Sanjeev Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, pages 177-188. ACM, 2004.

Approximating Hit Rate Curves using Streaming Algorithms

Authors Zachary Drudi, Nicholas J. A. Harvey, Stephen Ingram, Andrew Warfield, Jake Wires

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References