The Subset Assignment Problem for Data Placement in Caches

Authors Shahram Ghandeharizadeh, Sandy Irani, Jenny Lam

Thumbnail PDF


  • Filesize: 0.53 MB
  • 12 pages

Document Identifiers

Author Details

Shahram Ghandeharizadeh
Sandy Irani
Jenny Lam

Cite AsGet BibTex

Shahram Ghandeharizadeh, Sandy Irani, and Jenny Lam. The Subset Assignment Problem for Data Placement in Caches. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 35:1-35:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


We introduce the subset assignment problem in which items of varying sizes are placed in a set of bins with limited capacity. Items can be replicated and placed in any subset of the bins. Each (item, subset) pair has an associated cost. Not assigning an item to any of the bins is not free in general and can potentially be the most expensive option. The goal is to minimize the total cost of assigning items to subsets without exceeding the bin capacities. This problem is motivated by the design of caching systems composed of banks of memory with varying cost/performance specifications. The ability to replicate a data item in more than one memory bank can benefit the overall performance of the system with a faster recovery time in the event of a memory failure. For this setting, the number n of data objects (items) is very large and the number d of memory banks (bins) is a small constant (on the order of 3 or 4). Therefore, the goal is to determine an optimal assignment in time that minimizes dependence on n. The integral version of this problem is NP-hard since it is a generalization of the knapsack problem. We focus on an efficient solution to the LP relaxation as the number of fractionally assigned items will be at most d. If the data objects are small with respect to the size of the memory banks, the effect of excluding the fractionally assigned data items from the cache will be small. We give an algorithm that solves the LP relaxation and runs in time O(binom{3^d}{d+1} poly(d) n log(n) log(nC) log(Z)), where Z is the maximum item size and C the maximum storage cost.
  • Memory management
  • caching
  • simplex method
  • linear programming
  • min-cost flow


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1 edition, 2 1993. URL:
  2. R. K. Ahuja, J. B. Orlin, C. Stein, and R. E. Tarjan. Improved algorithms for bipartite network flow. SIAM J. Comput., 23(5):906-933, 1994. URL:
  3. T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: a database benchmark based on the Facebook social graph. In SIGMOD. ACM, 2013. URL:
  4. Sumita Barahmand and Shahram Ghandeharizadeh. BG: a benchmark to evaluate interactive social networking actions. In CIDR, January 2013. Google Scholar
  5. Chandra Chekuri and Sanjeev Khanna. A PTAS for the multiple knapsack problem. In SODA, pages 213-222. ACM, 2000. URL:
  6. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, Cambridge, MA, third edition, 2009. Google Scholar
  7. S. Ghandeharizadeh, S. Irani, and J. Lam. Memory hierarchy design for caching middleware in the age of NVM. Technical Report 2015-01, USC Database Laboratory, 2015. URL:
  8. S. Ghandeharizadeh, S. Irani, J. Lam, and J. Yap. CAMP: A cost adaptive multi-queue eviction policy for key-value stores. Technical Report 2014-07, USC Database Lab, 2014. URL:
  9. S. Ghandeharizadeh, S. Irani, J. Lam, and J. Yap. CAMP: a cost-aware multiqueue eviction policy. In Middleware 2014. Springer, 2014. URL:
  10. Shahram Ghandeharizadeh, Sandy Irani, and Jenny Lam. The subset assignment problem for data placement in caches. ArXiv ePrint, abs/1609.08767, 2016. URL:
  11. D. Gusfield, C. Martel, and D. Fernández-Baca. Fast algorithms for bipartite network flow. SIAM J. Comput., 16(2):237-251, 1987. URL:
  12. P. Jelenkovic and A. Radovanovic. Asymptotic insensitivity of least-recently-used caching to statistical dependency. In INFOCOM 2003., pages 438-447 vol.1, March 2003. URL:
  13. N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4(4):373-395, December 1984. URL:
  14. Hans Kellerer, Ulrich Pferschy, and David Pisinger. Knapsack Problems. Springer, 2004. Google Scholar
  15. Hyojun Kim, Sangeetha Seshadri, Clement L. Dickey, and Lawrence Chiu. Evaluating phase change memory for enterprise storage systems: A study of caching and tiering approaches. Trans. Storage, 10(4):15:1-15:21, October 2014. URL:
  16. Christos Koufogiannakis and Neal E. Young. A nearly linear-time PTAS for explicit fractional packing and covering linear programs. Algorithmica, 70(4):648-674, December 2014. URL:
  17. Silvano Martello and Paolo Toth. Knapsack Problems: Algorithms and Computer Implementations. John Wiley &Sons, Inc., 1990. Google Scholar
  18. Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. Non-volatile storage. Commun. ACM, 59(1):56-63, December 2015. URL:
  19. R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling Memcache at Facebook. NSDI, 13:385-398, 2013. Google Scholar
  20. David Starobinski and David Tse. Probabilistic methods for web caching. Performance Evaluation, 46(2–3):125-137, 2001. Advanced Performance Modeling. URL: