PEDroid: Automatically Extracting Patches from Android App Updates

Authors Hehao Li, Yizhuo Wang, Yiwei Zhang, Juanru Li, Dawu Gu



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2022.21.pdf
  • Filesize: 1.47 MB
  • 31 pages

Document Identifiers

Author Details

Hehao Li
  • Shanghai Jiao Tong University, China
Yizhuo Wang
  • Shanghai Jiao Tong University, China
Yiwei Zhang
  • Shanghai Jiao Tong University, China
Juanru Li
  • Shanghai Jiao Tong University, China
Dawu Gu
  • Shanghai Jiao Tong University, China

Acknowledgements

We are grateful to our reviewers for their valuable support and suggestions. This work was supported by the National Key Research and Development Program of China (No.2020AAA0107803).

Cite AsGet BibTex

Hehao Li, Yizhuo Wang, Yiwei Zhang, Juanru Li, and Dawu Gu. PEDroid: Automatically Extracting Patches from Android App Updates. In 36th European Conference on Object-Oriented Programming (ECOOP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 222, pp. 21:1-21:31, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ECOOP.2022.21

Abstract

Identifying and analyzing code patches is a common practice to not only understand existing bugs but also help find and fix similar bugs in new projects. Most patch analysis techniques aim at open-source projects, in which the differentials of source code are easily identified, and some extra information such as code commit logs could be leveraged to help find and locate patches. The task, however, becomes challenging when source code as well as development logs are lacking. A typical scenario is to discover patches in an updated Android app, which requires bytecode-level analysis. In this paper, we propose an approach to automatically identify and extract patches from updated Android apps by comparing the updated versions and their predecessors. Given two Android apps (original and updated versions), our approach first identifies identical and modified methods by similarity comparison through code features and app structures. Then, it compares these modified methods with their original implementations in the original app, and detects whether a patch is applied to the modified method by analyzing the difference in internal semantics. We implemented PEDroid, a prototype patch extraction tool against Android apps, and evaluated it with a set of popular open-source apps and a set of real-world apps from different Android vendors. PEDroid identifies 28 of the 36 known patches in the former, and successfully analyzes 568 real-world app updates in the latter, among which 94.37% of updates could be completed within 20 minutes.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Software evolution
Keywords
  • Diffing
  • Patch Identification
  • Android App Analysis
  • App Evolution

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Android debug bridge (adb), accessed: November 2021. URL: https://developer.android.com/studio/command-line/adb.
  2. Open source two-factor authentication for android, accessed: November 2021. URL: https://github.com/andOTP/andOTP.
  3. androguard, accessed: November 2021. URL: https://code.google.com/archive/p/androguard/.
  4. Ankidroid: Anki flashcards on android. your secret trick to achieve superhuman information retention, accessed: November 2021. URL: https://github.com/ankidroid/Anki-Android.
  5. Tanzirul Azim, Iulian Neamtiu, and Lisa M. Marvel. Towards self-healing smartphone software via automated patching. In ACM/IEEE International Conference on Automated Software Engineering, ASE '14, Vasteras, Sweden - September 15 - 19, 2014, pages 623-628. ACM, 2014. Google Scholar
  6. Michael Backes, Sven Bugiel, and Erik Derr. Reliable third-party library detection in android and its security applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pages 356-367. ACM, 2016. Google Scholar
  7. Bindiff, accessed: November 2021. URL: https://www.zynamics.com/bindiff.html.
  8. Anthony Desnos. Android: Static analysis using similarity distance. In 45th Hawaii International International Conference on Systems Science (HICSS-45 2012), Proceedings, 4-7 January 2012, Grand Wailea, Maui, HI, USA, pages 5394-5403. IEEE Computer Society, 2012. Google Scholar
  9. Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Montperrus. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE '14, pages 313-324, New York, NY, USA, 2014. ACM. Google Scholar
  10. Find security bugs, accessed: November 2021. URL: https://find-sec-bugs.github.io/.
  11. git-difftool documentation, accessed: November 2021. URL: https://git-scm.com/docs/git-difftool.
  12. Github: Where the world builds software, accessed: November 2021. URL: https://github.com/.
  13. Gnucash for android mobile companion application, accessed: November 2021. URL: https://github.com/codinguser/gnucash-android.
  14. open-source android gnss/gps test program, accessed: November 2021. URL: https://github.com/barbeau/gpstest.
  15. Steve Hanna, Ling Huang, Edward XueJun Wu, Saung Li, Charles Chen, and Dawn Song. Juxtapp: A scalable system for detecting code reuse among android applications. In Detection of Intrusions and Malware, and Vulnerability Assessment - 9th International Conference, DIMVA 2012, Heraklion, Crete, Greece, July 26-27, 2012, Revised Selected Papers, volume 7591 of Lecture Notes in Computer Science, pages 62-81. Springer, 2012. Google Scholar
  16. Project planning for developers, accessed: November 2021. URL: https://github.com/features/issues.
  17. Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018, pages 298-309. ACM, 2018. Google Scholar
  18. keytool, accessed: November 2021. URL: https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html.
  19. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. Automatic patch generation learned from human-written patches. In 35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18-26, 2013, pages 802-811. IEEE Computer Society, 2013. Google Scholar
  20. Li Li, Tegawendé F. Bissyandé, and Jacques Klein. Simidroid: Identifying and explaining similarities in android apps. In 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia, August 1-4, 2017, pages 136-143. IEEE Computer Society, 2017. Google Scholar
  21. Yi Li, Shaohua Wang, and Tien N. Nguyen. Dlfix: context-based code transformation learning for automated program repair. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, pages 602-614. ACM, 2020. Google Scholar
  22. Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. Improving bug detection via context-based code representation learning and attention-based neural networks. Proc. ACM Program. Lang., 3(OOPSLA):162:1-162:30, 2019. Google Scholar
  23. Xuliang Liu and Hao Zhong. Mining stackoverflow for program repair. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018, pages 118-129. IEEE Computer Society, 2018. Google Scholar
  24. Fan Long, Peter Amidon, and Martin Rinard. Automatic inference of code transforms for patch generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, pages 727-739. ACM, 2017. Google Scholar
  25. Siqi Ma, David Lo, Teng Li, and Robert H. Deng. Cdrep: Automatic repair of cryptographic misuses in android applications. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2016, Xi'an, China, May 30 - June 3, 2016, pages 711-722. ACM, 2016. Google Scholar
  26. Siqi Ma, Ferdian Thung, David Lo, Cong Sun, and Robert H. Deng. Vurle: Automatic vulnerability detection and repair by learning from examples. In Computer Security - ESORICS 2017 - 22nd European Symposium on Research in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II, volume 10493 of Lecture Notes in Computer Science, pages 229-246. Springer, 2017. Google Scholar
  27. Text editor - notes & todo (for android), accessed: November 2021. URL: https://github.com/gsantner/markor.
  28. Material design file manager for android, accessed: November 2021. URL: https://github.com/zhanghai/MaterialFiles.
  29. Stuart McIlroy, Nasir Ali, and Ahmed E. Hassan. Fresh apps: an empirical study of frequently-updated mobile apps in the google play store. Empir. Softw. Eng., 21(3):1346-1370, 2016. Google Scholar
  30. Shrink your java and android code, accessed: November 2021. URL: https://www.guardsquare.com/proguard.
  31. Thorsten Schäfer, Jan Jonas, and Mira Mezini. Mining framework usage changes from instantiation code. In International Conference on Software Engineering (ICSE), pages 471-480, New York, NY, USA, 2008. ACM. Google Scholar
  32. Danilo Silva, João Paulo da Silva, Gustavo Jansen de Souza Santos, Ricardo Terra, and Marco Tulio Valente. Refdiff 2.0: A multi-language refactoring detection tool. IEEE Trans. Software Eng., 47(12):2786-2802, 2021. Google Scholar
  33. Danilo Silva and Marco Tulio Valente. Refdiff: Detecting refactorings in version histories. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR '17, pages 269-279. IEEE Press, 2017. Google Scholar
  34. Soot - A java optimization framework, accessed: November 2021. URL: https://github.com/soot-oss/soot.
  35. Spotbugs, accessed: November 2021. URL: https://spotbugs.github.io/.
  36. Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury. Repairing crashes in android apps. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, pages 187-198. ACM, 2018. Google Scholar
  37. Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans. Softw. Eng. Methodol., 28(4):19:1-19:29, 2019. Google Scholar
  38. Xinda Wang, Kun Sun, Archer L. Batcheller, and Sushil Jajodia. Detecting "0-day" vulnerability: An empirical study of secret security patch in OSS. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019, Portland, OR, USA, June 24-27, 2019, pages 485-492. IEEE, 2019. Google Scholar
  39. Yan Wang, Haowei Wu, Hailong Zhang, and Atanas Rountev. ORLIS: obfuscation-resilient library detection for android. In Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, MOBILESoft@ICSE 2018, Gothenburg, Sweden, May 27 - 28, 2018, pages 13-23. ACM, 2018. Google Scholar
  40. Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. Sorting and transforming program repair ingredients via deep learning code similarities. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, pages 479-490. IEEE, 2019. Google Scholar
  41. Qiushi Wu, Yang He, Stephen McCamant, and Kangjie Lu. Precisely characterizing security impact in a flood of patches via symbolic rule comparison. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society, 2020. Google Scholar
  42. Jiayun Xie, Xiao Fu, Xiaojiang Du, Bin Luo, and Mohsen Guizani. Autopatchdroid: A framework for patching inter-app vulnerabilities in android application. In IEEE International Conference on Communications, ICC 2017, Paris, France, May 21-25, 2017, pages 1-6. IEEE, 2017. Google Scholar
  43. Zhenchang Xing and Eleni Stroulia. Umldiff: an algorithm for object-oriented design differencing. In 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), November 7-11, 2005, Long Beach, CA, USA, pages 54-65. ACM, 2005. Google Scholar
  44. Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. Patch based vulnerability matching for binary programs. In Proc. 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Virtual Event, USA, 2020. ACM. Google Scholar
  45. Zhengzi Xu, Bihuan Chen, Mahinthan Chandramohan, Yang Liu, and Fu Song. SPAIN: security patch analysis for binaries towards understanding the pain and pills. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, pages 462-472. IEEE / ACM, 2017. Google Scholar
  46. Dongjin Yu, Jie Wang, Qing Wu, Jiazha Yang, Jiaojiao Wang, Wei Yang, and Wei Yan. Detecting java code clones with multi-granularities based on bytecode. In 41st IEEE Annual Computer Software and Applications Conference, COMPSAC 2017, Turin, Italy, July 4-8, 2017. Volume 1, pages 317-326. IEEE Computer Society, 2017. Google Scholar
  47. Jiexin Zhang, Alastair R. Beresford, and Stephan A. Kollmann. Libid: reliable identification of obfuscated third-party android libraries. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019, pages 55-65. ACM, 2019. Google Scholar
  48. Mu Zhang and Heng Yin. Appsealer: Automatic generation of vulnerability-specific patches for preventing component hijacking attacks in android applications. In 21st Annual Network and Distributed System Security Symposium, NDSS 2014, San Diego, California, USA, February 23-26, 2014. The Internet Society, 2014. Google Scholar
  49. Yuan Zhang, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang, and Hao Chen. Detecting third-party libraries in android applications with high precision and recall. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018, pages 141-152. IEEE Computer Society, 2018. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail