Do Bugs Propagate? An Empirical Analysis of Temporal Correlations Among Software Bugs

Authors Xiaodong Gu , Yo-Sub Han , Sunghun Kim, Hongyu Zhang



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2021.11.pdf
  • Filesize: 1.8 MB
  • 21 pages

Document Identifiers

Author Details

Xiaodong Gu
  • School of Software, Shanghai Jiao Tong University, China
Yo-Sub Han
  • Department of Computer Science, Yonsei University, Seoul, South Korea
Sunghun Kim
  • Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong
Hongyu Zhang
  • The University of New Castle, Australia

Acknowledgements

The authors would like to thank anonymous reviewers for their very insightful comments and constructive suggestions in greatly improving the quality of this paper.

Cite AsGet BibTex

Xiaodong Gu, Yo-Sub Han, Sunghun Kim, and Hongyu Zhang. Do Bugs Propagate? An Empirical Analysis of Temporal Correlations Among Software Bugs. In 35th European Conference on Object-Oriented Programming (ECOOP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 194, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ECOOP.2021.11

Abstract

The occurrences of bugs are not isolated events, rather they may interact, affect each other, and trigger other latent bugs. Identifying and understanding bug correlations could help developers localize bug origins, predict potential bugs, and design better architectures of software artifacts to prevent bug affection. Many studies in the defect prediction and fault localization literature implied the dependence and interactions between multiple bugs, but few of them explicitly investigate the correlations of bugs across time steps and how bugs affect each other. In this paper, we perform social network analysis on the temporal correlations between bugs across time steps on software artifact ties, i.e., software graphs. Adopted from the correlation analysis methodology in social networks, we construct software graphs of three artifact ties such as function calls and type hierarchy and then perform longitudinal logistic regressions of time-lag bug correlations on these graphs. Our experiments on four open-source projects suggest that bugs can propagate as observed on certain artifact tie graphs. Based on our findings, we propose a hybrid artifact tie graph, a synthesis of a few well-known software graphs, that exhibits a higher degree of bug propagation. Our findings shed light on research for better bug prediction and localization models and help developers to perform maintenance actions to prevent consequential bugs.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Maintaining software
Keywords
  • empirical software engineering
  • bug propagation
  • software graph
  • bug correlation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Understand, URL: http://www.scitools.com/.
  2. Wala project. URL: http://wala.sourceforge.net/.
  3. Jun Ai, Wenzhu Su, Shaoxiong Zhang, and Yiwen Yang. A software network model for software structure and faults distribution analysis. IEEE Transactions on Reliability, 68(3):844-858, 2019. Google Scholar
  4. Aris Anagnostopoulos, Ravi Kumar, and Mohammad Mahdian. Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08, pages 7-15, New York, NY, USA, 2008. ACM. URL: https://doi.org/10.1145/1401890.1401897.
  5. Carina Andersson and Per Runeson. A replicated quantitative analysis of fault distributions in complex software systems. Software Engineering, IEEE Transactions on, 33(5):273-286, 2007. Google Scholar
  6. Sinan Aral, Lev Muchnik, and Arun Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51):21544-21549, 2009. URL: https://doi.org/10.1073/pnas.0908800106.
  7. Pamela Bhattacharya, Marios Iliofotou, Iulian Neamtiu, and Michalis Faloutsos. Graph-based analysis and prediction for software evolution. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 419-429, Piscataway, NJ, USA, 2012. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=2337223.2337273.
  8. C. Bird, N. Nagappan, H. Gall, B. Murphy, and P. Devanbu. Putting it all together: Using socio-technical networks to predict failures. In Software Reliability Engineering, 2009. ISSRE '09. 20th International Symposium on, pages 109-119, November 2009. URL: https://doi.org/10.1109/ISSRE.2009.17.
  9. John T Cacioppo, James H Fowler, and Nicholas A Christakis. Alone in the crowd: the structure and spread of loneliness in a large social network. Journal of personality and social psychology, 97(6):977, 2009. Google Scholar
  10. Peter J Carrington, John Scott, and Stanley Wasserman. Models and methods in social network analysis. Cambridge university press, 2005. Google Scholar
  11. Yujun Chen, Xian Yang, Hang Dong, Xiaoting He, Hongyu Zhang, Qingwei Lin, Junjie Chen, Pu Zhao, Yu Kang, Feng Gao, Zhangwei Xu, and Dongmei Zhang. Identifying linked incidents in large-scale online service systems. In Proceedings of the 2020 ESEC/FSE. ACM, 2020. Google Scholar
  12. Nicholas A Christakis and James H Fowler. The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4):370-379, 2007. Google Scholar
  13. Nicholas A Christakis and James H Fowler. Social contagion theory: examining dynamic social networks and human behavior. Statistics in medicine, 32(4):556-577, 2013. Google Scholar
  14. M. D'Ambros, M. Lanza, and R. Robbes. On the relationship between change coupling and software defects. In Reverse Engineering, 2009. WCRE '09. 16th Working Conference on, pages 135-144, October 2009. URL: https://doi.org/10.1109/WCRE.2009.19.
  15. Marco D'Ambros, Michele Lanza, and Romain Robbes. An extensive comparison of bug prediction approaches. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 31-41. IEEE, 2010. Google Scholar
  16. S. Davies and M. Roper. Bug localisation through diverse sources of information. In 2013 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pages 126-131, 2013. Google Scholar
  17. Harpal Dhama. Quantitative models of cohesion and coupling in software. Journal of Systems and Software, 29(1):65-74, 1995. Google Scholar
  18. Dario Di Nucci, Fabio Palomba, Giuseppe De Rosa, Gabriele Bavota, Rocco Oliveto, and Andrea De Lucia. A developer centered bug prediction model. IEEE Transactions on Software Engineering, 44(1):5-24, 2017. Google Scholar
  19. James H. Fowler, Nicholas A. Christakis, Steptoe, and Diez Roux. Dynamic spread of happiness in a large social network: Longitudinal analysis of the framingham heart study social network. BMJ: British Medical Journal, 338(7685):pp. 23-27, 2009. URL: http://www.jstor.org/stable/20511686.
  20. Katerina Goseva-Popstojanova and Jacob Tyo. Identification of security related bug reports via text mining using supervised and unsupervised classification. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS), pages 344-355. IEEE, 2018. Google Scholar
  21. Todd L Graves, Alan F Karr, James S Marron, and Harvey Siy. Predicting fault incidence using software change history. Software Engineering, IEEE Transactions on, 26(7):653-661, 2000. Google Scholar
  22. Geoffrey Hecht, Omar Benomar, Romain Rouvoy, Naouel Moha, and Laurence Duchien. Tracking the software quality of android applications along their evolution (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 236-247. IEEE, 2015. Google Scholar
  23. Kim Herzig, Sascha Just, and Andreas Zeller. It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In Proceedings of the 2013 International Conference on Software Engineering, pages 392-401. IEEE Press, 2013. Google Scholar
  24. Kim Herzig and Andreas Zeller. The impact of tangled code changes. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 121-130. IEEE Press, 2013. Google Scholar
  25. Qiaona Hong, Sunghun Kim, SC Cheung, and Christian Bird. Understanding a developer social network and its evolution. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pages 323-332. IEEE, 2011. Google Scholar
  26. Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. Improving bug triage with bug tossing graphs. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 111-120. ACM, 2009. Google Scholar
  27. Tian Jiang, Lin Tan, and Sunghun Kim. Personalized defect prediction. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 279-289, November 2013. URL: https://doi.org/10.1109/ASE.2013.6693087.
  28. JM Juran and FM Gryna. Juranís quality control handbook. NY: McGraw-Hill, 1988. Google Scholar
  29. David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, pages 137-146, New York, NY, USA, 2003. ACM. URL: https://doi.org/10.1145/956750.956769.
  30. Sunghun Kim, Thomas Zimmermann, E James Whitehead Jr, and Andreas Zeller. Predicting faults from cached history. In Proceedings of the 29th international conference on Software Engineering, pages 489-498. IEEE Computer Society, 2007. Google Scholar
  31. Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. Google Scholar
  32. Akif Günes Koru, Dongsong Zhang, Khaled El Emam, and Hongfang Liu. An investigation into the functional form of the size-defect relationship for software modules. Software Engineering, IEEE Transactions on, 35(2):293-304, 2009. Google Scholar
  33. An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. Combining deep learning with information retrieval to localize buggy files for bug reports (n). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 476-481. IEEE, 2015. Google Scholar
  34. Kung-Yee Liang and Scott L Zeger. Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13-22, 1986. Google Scholar
  35. Wanwangying Ma, Lin Chen, Yibiao Yang, Yuming Zhou, and Baowen Xu. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology, 69:50-70, 2016. Google Scholar
  36. Thomas J McCabe. A complexity measure. IEEE Transactions on software Engineering, pages 308-320, 1976. Google Scholar
  37. Andrew Meneely, Laurie Williams, Will Snipes, and Jason Osborne. Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT '08/FSE-16, pages 13-23, New York, NY, USA, 2008. ACM. URL: https://doi.org/10.1145/1453101.1453106.
  38. Ran Mo, Yuanfang Cai, Rick Kazman, Lu Xiao, and Qiong Feng. Architecture anti-patterns: Automatically detectable violations of design principles. IEEE Transactions on Software Engineering, 2019. Google Scholar
  39. Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. Bug propagation through code cloning: An empirical study. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 227-237. IEEE, 2017. Google Scholar
  40. Alessandro Murgia, Giulio Concas, Michele Marchesi, Roberto Tonelli, and Ivana Turnu. An analysis of bug distribution in object oriented systems. arXiv preprint arXiv:0905.3296, 2009. Google Scholar
  41. N. Nagappan and T. Ball. Using software dependencies and churn metrics to predict field failures: An empirical case study. In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposium on, pages 364-373, September 2007. URL: https://doi.org/10.1109/ESEM.2007.13.
  42. Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. Duplicate bug report detection with a combination of information retrieval and topic modeling. In 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 70-79. IEEE, 2012. Google Scholar
  43. Zhen Ni, Bin Li, Xiaobing Sun, Tianhao Chen, Ben Tang, and Xinchen Shi. Analyzing bug fix for automatic bug cause classification. Journal of Systems and Software, 163:110538, 2020. Google Scholar
  44. Martin Pinzger, Harald Gall, Michael Fischer, and Michele Lanza. Visualizing multiple evolution metrics. In Proceedings of the 2005 ACM symposium on Software visualization, pages 67-75. ACM, 2005. Google Scholar
  45. Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT '08/FSE-16, pages 2-12, New York, NY, USA, 2008. ACM. URL: https://doi.org/10.1145/1453101.1453105.
  46. Sarah J. Ratcliffe and Justine Shults. GEEQBOX: A matlab toolbox for generalized estimating equations and quasi-least squares. Journal of Statistical Software, 25(14):1-14, May 2008. URL: http://www.jstatsoft.org/v25/i14.
  47. H. Sajnani, V. Saini, and C. V. Lopes. A comparative study of bug patterns in java cloned and non-cloned code. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation, pages 21-30, 2014. Google Scholar
  48. Gehan MK Selim, Liliane Barbour, Weiyi Shang, Bram Adams, Ahmed E Hassan, and Ying Zou. Studying the impact of clones on software defects. In 2010 17th Working Conference on Reverse Engineering, pages 13-21. IEEE, 2010. Google Scholar
  49. Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. Towards more accurate retrieval of duplicate bug reports. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 253-262. IEEE, 2011. Google Scholar
  50. Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social action tracking via noise tolerant time-varying factor graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pages 1049-1058, New York, NY, USA, 2010. ACM. URL: https://doi.org/10.1145/1835804.1835936.
  51. Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 807-816, New York, NY, USA, 2009. ACM. URL: https://doi.org/10.1145/1557019.1557108.
  52. C. Tantithamthavorn, A. Ihara, and K. Matsumoto. Using co-change histories to improve bug localization performance. In 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pages 543-548, 2013. Google Scholar
  53. Justin G Trogdon, James Nonnemaker, and Joanne Pais. Peer effects in adolescent overweight. Journal of health economics, 27(5):1388-1399, 2008. Google Scholar
  54. Ye Wang, Na Meng, and Hao Zhong. An empirical study of multi-entity changes in real bug fixes. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 287-298. IEEE, 2018. Google Scholar
  55. Timo Wolf, Adrian Schroter, Daniela Damian, and Thanh Nguyen. Predicting build failures using social network analysis on developer communication. In Proceedings of the 31st International Conference on Software Engineering, pages 1-11. IEEE Computer Society, 2009. Google Scholar
  56. Lu Xiao, Yuanfang Cai, and Rick Kazman. Design rule spaces: A new form of architecture insight. In Proceedings of the 36th International Conference on Software Engineering, pages 967-977, 2014. Google Scholar
  57. Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 51-62. IEEE, 2016. Google Scholar
  58. Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone, and Frank Schweitzer. Categorizing bugs with social networks: A case study on four open source software communities. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 1032-1041, Piscataway, NJ, USA, 2013. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=2486788.2486930.
  59. Hongyu Zhang. On the distribution of software faults. IEEE Transactions on Software Engineering, pages 301-302, 2007. Google Scholar
  60. J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In 2012 34th International Conference on Software Engineering (ICSE), pages 14-24, 2012. Google Scholar
  61. Tom Zimmerman, Nachiappan Nagappan, Kim Herzig, Rahul Premraj, and Laurie Williams. An empirical study on the relation between dependency neighborhoods and failures. In Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on, pages 347-356. IEEE, 2011. Google Scholar
  62. Thomas Zimmermann and Nachiappan Nagappan. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th International Conference on Software Engineering, ICSE '08, pages 531-540, New York, NY, USA, 2008. ACM. URL: https://doi.org/10.1145/1368088.1368161.
  63. Thomas Zimmermann and Nachiappan Nagappan. Predicting defects with program dependencies. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM '09, pages 435-438, Washington, DC, USA, 2009. IEEE Computer Society. URL: https://doi.org/10.1109/ESEM.2009.5316024.