Static Analysis for AWS Best Practices in Python Code

Authors Rajdeep Mukherjee , Omer Tripp , Ben Liblit , Michael Wilson



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2022.14.pdf
  • Filesize: 0.8 MB
  • 28 pages

Document Identifiers

Author Details

Rajdeep Mukherjee
  • Amazon Web Services, San Jose, CA, USA
Omer Tripp
  • Amazon Web Services, San Jose, CA, USA
Ben Liblit
  • Amazon Web Services, Arlington, VA, USA
Michael Wilson
  • Amazon Web Services, Seattle, WA, USA

Cite As Get BibTex

Rajdeep Mukherjee, Omer Tripp, Ben Liblit, and Michael Wilson. Static Analysis for AWS Best Practices in Python Code. In 36th European Conference on Object-Oriented Programming (ECOOP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 222, pp. 14:1-14:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ECOOP.2022.14

Abstract

Amazon Web Services (AWS) is a comprehensive and broadly adopted cloud provider. AWS SDKs provide access to AWS services through API endpoints. However, incorrect use of these APIs can lead to code defects, crashes, performance issues, and other problems. AWS best practices are a set of guidelines for correct and secure use of these APIs to access cloud services, allowing conformant clients to fully reap the benefits of cloud computing.
We present static analyses, developed in the context of a commercial service for detection of code defects and security vulnerabilities, to identify deviations from AWS best practices. We focus on applications that use the AWS SDK for Python, called Boto3. Precise static analysis of Python cloud applications requires robust type inference for inferring the types of cloud service clients. However, Boto3’s "Pythonic" APIs pose unique challenges for type resolution, as does the interprocedural style in which service clients are used. We offer a layered approach that combines multiple type-resolution and tracking strategies in a staged manner: (i) general-purpose type inference augmented by type annotations, (ii) interprocedural dataflow analysis expressed in a domain-specific language, and (iii) name-based resolution as a low-confidence fallback. Across >3,000 popular Python GitHub repos that make use of the AWS SDK, our layered type inference system achieves 85% precision and 100% recall in inferring Boto3 clients in Python client code.
Additionally, we use real-world developer feedback to assess a representative sample of eight AWS best-practice rules. These rules detect a wide range of issues including pagination, polling, and batch operations. Developers have accepted more than 85% of the recommendations made by five out of eight Python rules, and almost 83% of all recommendations.

Subject Classification

ACM Subject Classification
  • Theory of computation → Program analysis
  • Computer systems organization → Cloud computing
Keywords
  • Python
  • Type inference
  • AWS
  • Cloud
  • Boto3
  • Best practices
  • Static analysis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In Kimberly Keeton and Timothy Roscoe, editors, 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, pages 265-283. USENIX Association, 2016. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.
  2. Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N. Nguyen, and Mira Mezini. Investigating next steps in static API-misuse detection. In Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc, editors, Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada, pages 265-275. IEEE / ACM, 2019. URL: https://doi.org/10.1109/MSR.2019.00053.
  3. Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N. Nguyen, and Mira Mezini. A systematic evaluation of static API-misuse detectors. IEEE Trans. Software Eng., 45(12):1170-1188, 2019. URL: https://doi.org/10.1109/TSE.2018.2827384.
  4. Amazon Web Services. AWS SDK for Python (Boto3) [online]. URL: https://aws.amazon.com/sdk-for-python/ [cited 2022-05-12].
  5. Amazon Web Services. Best practices for working with AWS Lambda functions: Function code [online]. URL: https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-code [cited 2022-05-12].
  6. Amazon Web Services. Boto3 - the AWS SDK for Python [online]. URL: https://github.com/boto/boto3 [cited 2022-05-12].
  7. Amazon Web Services. Boto3 developer guide: Low-level clients [online]. URL: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/clients.html [cited 2022-05-12].
  8. Amazon Web Services. Boto3 developer guide: Resources [online]. URL: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html [cited 2022-05-12].
  9. Amazon Web Services. What is Amazon CodeGuru Reviewer? [online]. URL: https://docs.aws.amazon.com/codeguru/latest/reviewer-ug/welcome.html [cited 2022-05-12].
  10. Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. RPython: a step towards reconciling dynamically and statically typed OO languages. In Pascal Costanza and Robert Hirschfeld, editors, Proceedings of the 2007 Symposium on Dynamic Languages, DLS 2007, October 22, 2007, Montreal, Quebec, Canada, pages 53-64. ACM, 2007. URL: https://doi.org/10.1145/1297081.1297091.
  11. David F. Bacon and Peter F. Sweeney. Fast static analysis of C++ virtual function calls. In Lougie Anderson and James Coplien, editors, Proceedings of the 1996 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA '96), San Jose, California, USA, October 6-10, 1996, pages 324-341. ACM, 1996. URL: https://doi.org/10.1145/236337.236371.
  12. Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. PYInfer: Deep learning semantic type inference for Python variables. CoRR, abs/2106.14316, 2021. URL: http://arxiv.org/abs/2106.14316.
  13. Julian Dolby, Avraham Shinnar, Allison Allain, and Jenna M. Reinen. Ariadne: analysis for machine learning programs. In Justin Gottschlich and Alvin Cheung, editors, Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, pages 1-10. ACM, 2018. URL: https://doi.org/10.1145/3211346.3211349.
  14. Vlad Emelianov. mypy_boto3_builder: Type annotations builder for boto3 compatible with VSCode, PyCharm, Emacs, Sublime Text, pyright and mypy [online]. URL: https://vemel.github.io/mypy_boto3_builder/ [cited 2021-12-01].
  15. Facebook. Pyre [online]. URL: https://pyre-check.org/ [cited 2021-11-30].
  16. Levin Fritz and Jurriaan Hage. Cost versus precision for approximate typing for Python. In Ulrik Pagh Schultz and Jeremy Yallop, editors, Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2017, Paris, France, January 18-20, 2017, pages 89-98. ACM, 2017. URL: https://doi.org/10.1145/3018882.3018888.
  17. Aymeric Fromherz, Abdelraouf Ouadjaout, and Antoine Miné. Static value analysis of Python programs by abstract interpretation. In Aaron Dutle, César A. Muñoz, and Anthony Narkawicz, editors, NASA Formal Methods - 10th International Symposium, NFM 2018, Newport News, VA, USA, April 17-19, 2018, Proceedings, volume 10811 of Lecture Notes in Computer Science, pages 185-202. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-77935-5_14.
  18. Erich Gamma, Richard Helm, Ralph E. Johnson, and John M. Vlissides. Design patterns: Abstraction and reuse of object-oriented design. In Oscar Nierstrasz, editor, ECOOP'93 - Object-Oriented Programming, 7th European Conference, Kaiserslautern, Germany, July 26-30, 1993, Proceedings, volume 707 of Lecture Notes in Computer Science, pages 406-431. Springer, 1993. URL: https://doi.org/10.1007/3-540-47910-4_21.
  19. Google. Google Cloud Pub/Sub documentation [online]. URL: https://cloud.google.com/pubsub/docs [cited 2022-05-12].
  20. Google. pytype [online]. URL: https://google.github.io/pytype/ [cited 2021-11-30].
  21. David Grove and Craig Chambers. A framework for call graph construction algorithms. ACM Trans. Program. Lang. Syst., 23(6):685-746, 2001. URL: https://doi.org/10.1145/506315.506316.
  22. Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. MaxSMT-based type inference for Python 3. In Hana Chockler and Georg Weissenbacher, editors, Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II, volume 10982 of Lecture Notes in Computer Science, pages 12-19. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-96142-2_2.
  23. Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. Deep learning type inference. In Gary T. Leavens, Alessandro Garcia, and Corina S. Pasareanu, editors, Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, pages 152-162. ACM, 2018. URL: https://doi.org/10.1145/3236024.3236051.
  24. Maximilian A. Köhl. An executable structural operational formal semantics for Python. Master’s thesis, Saarland University, December 2020. URL: https://arxiv.org/abs/2109.03139.
  25. Jukka Lehtosalo, Guido van Rossum, Ivan Levkivskyi, and Michael J. Sullivan. mypy - optional static typing for Python [online]. URL: http://mypy-lang.org/ [cited 2021-11-30].
  26. Microsoft. Pyright: Static type checker for Python [online]. URL: https://github.com/microsoft/pyright [cited 2021-11-30].
  27. Raphaël Monat, Abdelraouf Ouadjaout, and Antoine Miné. Static type analysis by abstract interpretation of Python programs. In Robert Hirschfeld and Tobias Pape, editors, 34th European Conference on Object-Oriented Programming, ECOOP 2020, November 15-17, 2020, Berlin, Germany (Virtual Conference), volume 166 of LIPIcs, pages 17:1-17:29. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ECOOP.2020.17.
  28. Rajdeep Mukherjee, Omer Tripp, Ben Liblit, and Michael Wilson. Static analysis for AWS best practices in Python code. CoRR, abs/2205.04432, 2022. URL: https://doi.org/10.48550/arXiv.2205.04432.
  29. Joe Gibbs Politz, Alejandro Martinez, Matthew Milano, Sumner Warren, Daniel Patterson, Junsong Li, Anand Chitipothu, and Shriram Krishnamurthi. Python: the full monty. In Antony L. Hosking, Patrick Th. Eugster, and Cristina V. Lopes, editors, Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, part of SPLASH 2013, Indianapolis, IN, USA, October 26-31, 2013, pages 217-232. ACM, 2013. URL: https://doi.org/10.1145/2509136.2509536.
  30. Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. TypeWriter: neural type prediction with search-based validation. In Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann, editors, ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, pages 209-220. ACM, 2020. URL: https://doi.org/10.1145/3368089.3409715.
  31. Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from "big code". In Sriram K. Rajamani and David Walker, editors, Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pages 111-124. ACM, 2015. URL: https://doi.org/10.1145/2676726.2677009.
  32. Michael Salib. Starkiller : a static type inferencer and compiler for Python. PhD thesis, Massachusetts Institute of Technology, May 2004. Google Scholar
  33. Gideon Joachim Smeding. An executable operational semantics for Python. Master’s thesis, Universiteit Utrecht, 2008. URL: http://www.cs.uu.nl/education/scripties/scriptie.php?SID=INF/SCR-2008-029.
  34. Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. Design and evaluation of gradual typing for python. In Andrew P. Black and Laurence Tratt, editors, DLS'14, Proceedings of the 10th ACM Symposium on Dynamic Languages, part of SLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 45-56. ACM, 2014. URL: https://doi.org/10.1145/2661088.2661101.
  35. Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. LambdaNet: Probabilistic type inference using graph neural networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL: https://openreview.net/forum?id=Hkx6hANtwH.
  36. Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. Python probabilistic type inference with natural language support. In Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su, editors, Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, pages 607-618. ACM, 2016. URL: https://doi.org/10.1145/2950290.2950343.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail