Periodic Bandits and Wireless Network Selection

Authors Shunhao Oh, Anuja Meetoo Appavoo, Seth Gilbert

Thumbnail PDF


  • Filesize: 2.1 MB
  • 15 pages

Document Identifiers

Author Details

Shunhao Oh
  • Department of Computer Science, National University of Singapore
Anuja Meetoo Appavoo
  • Department of Computer Science, National University of Singapore
Seth Gilbert
  • Department of Computer Science, National University of Singapore


This project was funded by Singapore Ministry of Education Grant MOE2018-T2-1-160 (Beyond Worst-Case Analysis: A Tale of Distributed Algorithms).

Cite AsGet BibTex

Shunhao Oh, Anuja Meetoo Appavoo, and Seth Gilbert. Periodic Bandits and Wireless Network Selection. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 149:1-149:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Bandit-style algorithms have been studied extensively in stochastic and adversarial settings. Such algorithms have been shown to be useful in multiplayer settings, e.g. to solve the wireless network selection problem, which can be formulated as an adversarial bandit problem. A leading bandit algorithm for the adversarial setting is EXP3. However, network behavior is often repetitive, where user density and network behavior follow regular patterns. Bandit algorithms, like EXP3, fail to provide good guarantees for periodic behaviors. A major reason is that these algorithms compete against fixed-action policies, which is ineffective in a periodic setting. In this paper, we define a periodic bandit setting, and periodic regret as a better performance measure for this type of setting. Instead of comparing an algorithm’s performance to fixed-action policies, we aim to be competitive with policies that play arms under some set of possible periodic patterns F (for example, all possible periodic functions with periods 1,2,*s,P). We propose Periodic EXP4, a computationally efficient variant of the EXP4 algorithm for periodic settings. With K arms, T time steps, and where each periodic pattern in F is of length at most P, we show that the periodic regret obtained by Periodic EXP4 is at most O(sqrt{PKT log K + KT log |F|}). We also prove a lower bound of Omega (sqrt{PKT + KT {log |F|}/{log K}}) for the periodic setting, showing that this is optimal within log-factors. As an example, we focus on the wireless network selection problem. Through simulation, we show that Periodic EXP4 learns the periodic pattern over time, adapts to changes in a dynamic environment, and far outperforms EXP3.

Subject Classification

ACM Subject Classification
  • Theory of computation → Online learning algorithms
  • multi-armed bandits
  • wireless network selection
  • periodicity in environment


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Anuja Meetoo Appavoo, Seth Gilbert, and Kian-Lee Tan. Shrewd Selection Speeds Surfing: Use Smart EXP3! In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pages 188-199. IEEE, 2018. Google Scholar
  2. Anuja Meetoo Appavoo, Seth Gilbert, and Kian-Lee Tan. Cooperation Speeds Surfing: Use Co-Bandit! arXiv preprint, 2019. URL:
  3. E. Aryafar, A. Keshavarz-Haddad, C. Joe-Wong, and M. Chiang. Max-Min Fair Resource Allocation in HetNets: Distributed Algorithms and Hybrid Architecture. In ICDCS, 2017, pages 857-869. IEEE, 2017. Google Scholar
  4. E. Aryafar, A. Keshavarz-Haddad, M.l Wang, and M. Chiang. RAT selection games in HetNets. In INFOCOM, pages 998-1006. IEEE, 2013. Google Scholar
  5. Jean-Yves Audibert and Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. In COLT, pages 217-226, 2009. Google Scholar
  6. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002. Google Scholar
  7. O. Avner and S. Mannor. Multi-user lax communications: A multi-armed bandit approach. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pages 1-9, April 2016. URL:
  8. Y. Bejerano, S-J. Han, and L. E. Li. Fairness and load balancing in wireless LANs using association control. In MobiCom, pages 315-329. ACM, 2004. Google Scholar
  9. Omar Besbes, Yonatan Gur, and Assaf Zeevi. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 199-207. Curran Associates, Inc., 2014. URL:
  10. Sébastien Bubeck, Nicolo Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trendsregistered in Machine Learning, 5(1):1-122, 2012. Google Scholar
  11. M. H. Cheung, F. Hou, J. Huang, and R. Southwell. Congestion-Aware Distributed Network Selection for Integrated Cellular and Wi-Fi Networks. arXiv preprint, 2017. URL:
  12. S. Deng, A. Sivaraman, and H. Balakrishnan. All your network are belong to us: A transport framework for mobile network selection. In HotMobile. ACM, 2014. Google Scholar
  13. Yi Gai, Bhaskar Krishnamachari, and Rahul Jain. Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation. In New Frontiers in Dynamic Spectrum, 2010 IEEE Symposium on, pages 1-9. IEEE, 2010. Google Scholar
  14. D. Golovin, M. Faulkner, and A. Krause. Online distributed sensor selection. In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, pages 220-231. ACM, 2010. Google Scholar
  15. B. Kauffmann, F. Baccelli, A. Chaintreau, V. Mhatre, K. Papagiannaki, and C. Diot. Measurement-based self organization of interfering 802.11 wireless access networks. In INFOCOM 2007, pages 1451-1459. IEEE, 2007. Google Scholar
  16. R. Kleinberg, G. Piliouras, and E. Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. In ACM STOC, pages 533-542. ACM, 2009. Google Scholar
  17. S. Maghsudi and S. Stanczak. Relay selection with no side information: An adversarial bandit approach. In WCNC, pages 715-720. IEEE, 2013. Google Scholar
  18. A. Mishra, V. Brik, S. Banerjee, A. Srinivasan, and W. A. Arbaugh. A Client-Driven Approach for Channel Management in Wireless LANs. In Infocom, 2006. Google Scholar
  19. E Monsef, A. Keshavarz-Haddad, E. Aryafar, J. Saniie, and M. Chiang. Convergence properties of general network selection games. In INFOCOM, pages 1445-1453. IEEE, 2015. Google Scholar
  20. D. Niyato and E. Hossain. Dynamics of network selection in heterogeneous wireless networks: An evolutionary game approach. TVT, 58(4):2008-2017, 2009. Google Scholar
  21. Shunhao Oh, Anuja Meetoo Appavoo, and Seth Gilbert. Periodic Bandits and Wireless Network Selection [full version of paper]. arXiv preprint, 2019. URL:
  22. Allesiardo Robin, Raphaël Feraud, and Odalric-Ambrym Maillard. The Non-stationary Stochastic Multi-armed Bandit Problem. International Journal of Data Science and Analytics, March 2017. URL:
  23. Yevgeny Seldin and Gábor Lugosi. A lower bound for multi-armed bandits with expert advice. In 13th European Workshop on Reinforcement Learning (EWRL), 2016. Google Scholar
  24. SimPy. SimPy - Event discrete simulation for Python, 2016. , accessed 2018-19-12. URL:
  25. K. Sui, M. Zhou, D. Liu, M. Ma, D. Pei, Y. Zhao, Z. Li, and T. Moscibroda. Characterizing and improving wifi latency in large-scale operational networks. In MobiSys, pages 347-360. ACM, 2016. Google Scholar
  26. Vasilis Syrgkanis, Akshay Krishnamurthy, and Robert Schapire. Efficient algorithms for adversarial contextual learning. In International Conference on Machine Learning, pages 2159-2168, 2016. Google Scholar
  27. C. Tekin and M. Liu. Performance and Convergence of Multi-user Online Learning. In GAMENETS, pages 321-336. Springer, 2011. Google Scholar
  28. H. A. Tran, S. Hoceini, A. Mellouk, J. Perez, and S. Zeadally. QoE-based server selection for content distribution networks. IEEE Transactions on Computers, 63(11):2803-2815, 2014. Google Scholar
  29. Q. Wu, Z. Du, P. Yang, Y.-D. Yao, and J. Wang. Traffic-aware online network selection in heterogeneous wireless networks. TVT, 65(1):381-397, 2016. Google Scholar
  30. K. Zhu, D. Niyato, and P. Wang. Network selection in heterogeneous wireless networks: Evolution with incomplete information. In WCNC, pages 1-6. IEEE, 2010. Google Scholar