Stack Graphs: Name Resolution at Scale

Authors Douglas A. Creager , Hendrik van Antwerpen



PDF
Thumbnail PDF

File

OASIcs.EVCS.2023.8.pdf
  • Filesize: 0.72 MB
  • 12 pages

Document Identifiers

Author Details

Douglas A. Creager
  • GitHub, Southborough, MA, USA
Hendrik van Antwerpen
  • GitHub, Amsterdam, The Netherlands

Cite As Get BibTex

Douglas A. Creager and Hendrik van Antwerpen. Stack Graphs: Name Resolution at Scale. In Eelco Visser Commemorative Symposium (EVCS 2023). Open Access Series in Informatics (OASIcs), Volume 109, pp. 8:1-8:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/OASIcs.EVCS.2023.8

Abstract

We present stack graphs, an extension of Visser et al.’s scope graphs framework. Stack graphs power Precise Code Navigation at GitHub, allowing users to navigate name binding references both within and across repositories. Like scope graphs, stack graphs encode the name binding information about a program in a graph structure, in which paths represent valid name bindings. Resolving a reference to its definition is then implemented with a simple path-finding search.
GitHub hosts millions of repositories, containing petabytes of total code, implemented in hundreds of different programming languages, and receiving thousands of pushes per minute. To support this scale, we ensure that the graph construction and path-finding judgments are file-incremental: for each source file, we create an isolated subgraph without any knowledge of, or visibility into, any other file in the program. This lets us eliminate the storage and compute costs of reanalyzing file versions that we have already seen. Since most commits change a small fraction of the files in a repository, this greatly amortizes the operational costs of indexing large, frequently changed repositories over time. To handle type-directed name lookups (which require "pausing" the current lookup to resolve another name), our name resolution algorithm maintains a stack of the currently paused (but still pending) lookups. Stack graphs can be constructed via a purely syntactic analysis of the program’s source code, using a new declarative graph construction language. This means that we can extract name binding information for every repository without any per-package configuration, and without having to invoke an arbitrary, untrusted, package-specific build process.

Subject Classification

ACM Subject Classification
  • Theory of computation → Program analysis
  • Software and its engineering → Software libraries and repositories
Keywords
  • Scope graphs
  • name binding
  • code navigation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Max Brunsfeld et al. Tree-sitter: An incremental parsing system for programming tools. URL: https://doi.org/10.5281/zenodo.4619183.
  2. Dirk Bäumer et al. Language Server Index Format specification, 0.4.0, 2019. URL: https://github.com/microsoft/language-server-protocol/blob/main/indexFormat/specification.md.
  3. Douglas A. Creager and Hendrik van Antwerpen. stack-graphs library, version 0.10.2, January 2023. URL: https://doi.org/10.5281/zenodo.7520627.
  4. GitHub. Linguist: The language savant. URL: https://github.com/github/linguist/.
  5. Sven Kloppenburg. Incrementalization of Analyses for Next Generation IDEs. PhD thesis, Technische Universität, Darmstadt, November 2009. URL: http://tuprints.ulb.tu-darmstadt.de/1960/.
  6. Quentin Le Dilavrec, Djamel Eddine Khelladi, Arnaud Blouin, and Jean-Marc Jézéquel. HyperAST: Enabling efficient analysis of software histories at scale. In The 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022), October 2022. Google Scholar
  7. Ralph C. Merkle. A digital signature based on a conventional encryption function. In Carl Pomerance, editor, Advances in Cryptology - CRYPTO '87, pages 369-378, Berlin, Heidelberg, 1988. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/3-540-48184-2_32.
  8. Microsoft. Language Server Protocol specification, 3.17, May 2022. URL: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/.
  9. Robert B. Miller. Response time in man-computer conversational transactions. In Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, Part I, AFIPS '68 (Fall, part I), pages 267-277, New York, NY, USA, 1968. Association for Computing Machinery. URL: https://doi.org/10.1145/1476589.1476628.
  10. Pierre Néron, Andrew Tolmach, Eelco Visser, and Guido Wachsmuth. A theory of name resolution. In Jan Vitek, editor, Programming Languages and Systems, pages 205-231, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-662-46669-8_9.
  11. Jeff Smits, Gabriël D. P. Konat, and Eelco Visser. Constructing hybrid incremental compilers for cross-module extensibility with an internal build system. The Art, Science, and Engineering of Programming, 4(3), 2020. URL: https://doi.org/10.22152/programming-journal.org/2020/4/16.
  12. Hendrik van Antwerpen. tree-sitter-stack-graphs-typescript library. URL: https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-typescript.
  13. Hendrik van Antwerpen, Casper Bach Poulsen, Arjen Rouvoet, and Eelco Visser. Scopes as types. Proc. ACM Program. Lang., 2(OOPSLA), October 2018. URL: https://doi.org/10.1145/3276484.
  14. Hendrik van Antwerpen and Douglas A. Creager. tree-sitter-stack-graphs library, version 0.6.0, January 2023. URL: https://doi.org/10.5281/zenodo.7534898.
  15. Hendrik van Antwerpen, Pierre Néron, Andrew Tolmach, Eelco Visser, and Guido Wachsmuth. A constraint language for static semantic analysis based on scope graphs. In Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM '16, pages 49-60, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2847538.2847543.
  16. Hendrik van Antwerpen, Rob Rix, Douglas A. Creager, et al. tree-sitter-graph 0.7.0, October 2022. URL: https://doi.org/10.5281/zenodo.7221982.
  17. Aron Zwaan and Hendrik van Antwerpen. Scope graphs: The story so far. In Ralf Lämmel, Peter D. Mosses, and Friedrich Steimann, editors, Eelco Visser Commemorative Symposium (EVCS 2023), pages 13:1-13:13. OpenAccess Series in Informatics, April 2023. URL: https://doi.org/10.4230/OASIcs.EVCS.2023.13.
  18. Aron Zwaan, Hendrik van Antwerpen, and Eelco Visser. Incremental type-checking for free: Using scope graphs to derive incremental type-checkers. Proc. ACM Program. Lang., 6(OOPSLA2), October 2022. URL: https://doi.org/10.1145/3563303.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail