Comparing Bottom-Up with Top-Down Parsing Architectures for the Syntax Definition Formalism from a Disambiguation Standpoint

Author Jurgen J. Vinju



PDF
Thumbnail PDF

File

OASIcs.EVCS.2023.31.pdf
  • Filesize: 0.67 MB
  • 16 pages

Document Identifiers

Author Details

Jurgen J. Vinju
  • NWO-I Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands
  • TU Eindhoven, The Netherlands

Acknowledgements

This essay would not have been possible without the past teamwork with Eelco Visser, Paul Klint, Jan Heering, Jeroen Scheerder, Mark van den Brand, Chris Verhoef, Alex Sellink, Pieter Olivier, Hayco de Jong, Georgios (Rob) Economopoulos, Martin Bravenboer, Tijs van der Storm, Joost Visser, Merijn de Jonge, Jørgen Iversen, Arnold Lankamp, Ali Afroozeh, Anastasia Izmaylova, Bas Basten, Martin Bravenboer, Adrian Johnstone and Elizabeth Scott on the topic of context-free general parsing and disambiguation and supporting technologies. However, any error or inconsistency in the following is all mine.

Cite AsGet BibTex

Jurgen J. Vinju. Comparing Bottom-Up with Top-Down Parsing Architectures for the Syntax Definition Formalism from a Disambiguation Standpoint. In Eelco Visser Commemorative Symposium (EVCS 2023). Open Access Series in Informatics (OASIcs), Volume 109, pp. 31:1-31:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/OASIcs.EVCS.2023.31

Abstract

Context-free general parsing and disambiguation algorithms are threaded throughout the research and engineering career of Eelco Visser. Both our Ph.D. theses featured the study of "disambiguation." Disambiguation is the declarative definition of choices among different parse trees, derived using the same context-free grammar, for the same input sentence. This essay highlights the differences between syntactic disambiguation for context-free general parsing in a top-down architecture and a bottom-up architecture. The differences between top-down and bottom-up are mainly observed as practical aspects of the software architecture and software implementation. Eventually, the concept of data-dependent context-free grammar brings all engineering perspectives of disambiguation back into a conceptual (declarative) framework independent of the parsing architecture. The novelty in this essay is the juxtaposition of three general parsing architectures from a disambiguation point of view: SGLR, SGLL, and DDGLL. It also motivates design decisions in the parsing architectures for SDF{1,2} and Rascal with previously unpublished detail. The essay falls short of a literature review and a tool evaluation since it does not investigate the disambiguation methods of the many other parser generator tools that exist. The fact that only the implementation algorithms are different between the compared parsing architectures, while the syntax definition formalisms have practically the same formal semantics for historical reasons, nicely "isolates the variable" of interest. We hope this essay lives up to the enormous enthusiasm, curiosity, and drive for perfection in syntax definition and parsing that Eelco always radiated. We dearly miss him.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Syntax
Keywords
  • parser generation
  • context-free grammars
  • GLR
  • GLL
  • algorithms
  • disambiguation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ali Afroozeh and Anastasia Izmaylova. One Parser to Rule Them All. In Proceedings of the 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! 2015, pages 151-170. ACM, 2015. URL: https://doi.org/10.1145/2814228.2814242.
  2. Ali Afroozeh and Anastasia Izmaylova. Iguana: a practical data-dependent parsing framework. In Ayal Zaks and Manuel V. Hermenegildo, editors, Proceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016, pages 267-268. ACM, 2016. URL: https://doi.org/10.1145/2892208.2892234.
  3. Ali Afroozeh, Mark van den Brand, Adrian Johnstone, Elizabeth Scott, and Jurgen J. Vinju. Safe specification of operator precedence rules. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2013. Google Scholar
  4. Luís Amorim and Eelco Visser. Multi-purpose Syntax Definition with SDF3, pages 1-23. Springer, September 2020. URL: https://doi.org/10.1007/978-3-030-58768-0_1.
  5. Bas Basten and Jurgen Vinju. Parse forest diagnostics with dr. ambiguity. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011. Google Scholar
  6. Bas Basten and Jurgen Vinju. Parse forest diagnostics with Dr. Ambiguity. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011. Google Scholar
  7. Mark G.J. van den Brand, Arie van Deursen, Jan Heering, Hayco A. de Jong, Merijn de Jonge, Tobias Kuipers, Paul. Klint, Leon Moonen, Pieter .A. Olivier, Jeroen Scheerder, Jurgen J. Vinju, Eelco Visser, and Joost Visser. The ASF+SDF Meta-Environment: a Component-Based Language Development Environment. In R. Wilhelm, editor, Compiler Construction (CC '01), volume 2027 of Lecture Notes in Computer Science, pages 365-370. Springer-Verlag, 2001. Google Scholar
  8. Mark G.J. van den Brand, Steven Klusener, Leon Moonen, and Jurgen J. Vinju. Generalized Parsing and Term Rewriting - Semantics Directed Disambiguation. In Barret Bryant and João Saraiva, editors, Third Workshop on Language Descriptions Tools and Applications, Electronic Notes in Theoretical Computer Science. Elsevier, 2003. Google Scholar
  9. Mark van den Brand, Jørgen Iversen, and Peter Mosses. An Action Environment. Electr. Notes Theor. Comput. Sci., 110:149-168, December 2004. Google Scholar
  10. Mark van den Brand and Paul Klint. ATerms for manipulation and exchange of structured data: It’s all about sharing. Information & Software Technology, 49:55-64, January 2007. Google Scholar
  11. Mark van den Brand, Paul Klint, Hayco de Jong, and Pieter Olivier. Efficient annotated terms. Software - Practice & Experience, 30(2), January 2000. Google Scholar
  12. Mark van den Brand, Pierre-Etienne Moreau, and Jurgen Vinju. Environments for term rewriting engines for free! In Proceedings of the 14th International Conference on Rewriting Techniques and Applications, RTA'03, pages 424-435, Berlin, Heidelberg, 2003. Springer-Verlag. Google Scholar
  13. Martin Bravenboer, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Visser. Stratego/XT 0.17. a language and toolset for program transformation. Science of Computer Programming, 72(1):52-70, 2008. Special Issue on Second issue of experimental software and toolkits (EST). URL: https://doi.org/10.1016/j.scico.2007.11.003.
  14. Frank DeRemer. Simple lr(k) grammars. Commun. ACM, 14(7):453-460, 1971. URL: https://doi.org/10.1145/362619.362625.
  15. Arie Van Deursen, Jan Heering, and Paul Klint. Language Prototyping: An Algebraic Specification Approach: Vol. V. World Scientific Publishing Co., Inc., USA, 1996. Google Scholar
  16. Jay Earley. An efficient context-free parsing algorithm. Commun. ACM, 13:94-102, February 1970. URL: https://doi.org/10.1145/362007.362035.
  17. Giorgios R. Economopoulos, Paul Klint, and Jurgen J. Vinju. Faster scannerless GLR parsing. In Oege de Moor and Michael I. Schwartzbach, editors, Compiler Construction (CC), volume 5501 of Lecture Notes in Computer Science, pages 126-141. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-00722-4_10.
  18. J. Heering and P. Klint. A syntax definition formalism, 1986. ESPRIT"86: Results and Achievements, page 619-630. Google Scholar
  19. Jan Heering, Paul R. H. Hendriks, Paul Klint, and Jan Rekers. The syntax definition formalism SDF - reference manual. SIGPLAN Not., 24(11):43-75, November 1989. URL: https://doi.org/10.1145/71605.71607.
  20. Trevor Jim and Yitzhak Mandelbaum. Delayed semantic actions in yakker. In Claus Brabrand and Eric Van Wyk, editors, Language Descriptions, Tools and Applications, LDTA 2011, Saarbrücken, Germany, March 26-27, 2011. Proceeding, page 8. ACM, 2011. URL: https://doi.org/10.1145/1988783.1988791.
  21. Trevor Jim, Yitzhak Mandelbaum, and David Walker. Semantics and algorithms for data-dependent grammars. In Manuel V. Hermenegildo and Jens Palsberg, editors, Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010, pages 417-430. ACM, 2010. URL: https://doi.org/10.1145/1706299.1706347.
  22. Lennart C.L. Kats and Eelco Visser. The Spoofax language workbench: Rules for declarative specification of languages and IDEs. SIGPLAN Not., 45(10):444-463, October 2010. Google Scholar
  23. Paul Klint. From SPRING to SUMMER: design, definition and implementation of programming languages for string manipulation and pattern matching. PhD thesis, Technische Hogeschool Eindhoven, March 1982. Google Scholar
  24. Paul Klint, Tijs van der Storm, and J.J. Vinju. EASY meta-programming with Rascal. In João Fernandes, Ralf Lämmel, Joost Visser, and João Saraiva, editors, Generative and Transformational Techniques in Software Engineering III, volume 6491 of LNCS, pages 222-289. Springer Berlin / Heidelberg, 2011. Google Scholar
  25. Paul Klint, Tijs van der Storm, and Jurgen J. Vinju. Rascal: A domain specific language for source code analysis and manipulation. In Ninth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pages 168-177. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/SCAM.2009.28.
  26. Paul Klint and Eelco Visser. Using filters for the disambiguation of context-free grammars. In G. Pighizzini and P. San Pietro, editors, Proc. ASMICS Workshop on Parsing Theory, pages 1-20, Milano, Italy, 1994. Tech. Rep. 126-1994, Dipartimento di Scienze dell'Informazione, Università di Milano. Google Scholar
  27. Ralf Lämmel and Chris Verhoef. Semi-automatic Grammar Recovery. Software - Practice & Experience, 31(15):1395-1438, December 2001. Google Scholar
  28. P. J. Landin. The next 700 programming languages. Commun. ACM, 9:157-166, March 1966. Google Scholar
  29. Leon Moonen. Generating robust parsers using island grammars. Proceedings Eighth Working Conference on Reverse Engineering, pages 13-22, 2001. Google Scholar
  30. Rohman Nozohoor-Farshi. Handling of ill-designed grammars in tomita's parsing algorithm. In Proceedings of the First International Workshop on Parsing Technologies, pages 182-192, Pittsburgh, Pennsylvania, USA, August 1989. Carnegy Mellon University. URL: https://aclanthology.org/W89-0219.
  31. J. Rekers. Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam, 1992. Google Scholar
  32. Jan Rekers and Wilco Koorn. Substring parsing for arbitrary context-free grammars. In Proceedings of the Second International Workshop on Parsing Technologies, pages 218-224, Cancun, Mexico, February 13-25 1991. Association for Computational Linguistics. URL: https://aclanthology.org/1991.iwpt-1.25.
  33. D. J. Salomon and G. V. Cormack. Scannerless NSLR(1) parsing of programming languages. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, PLDI 1989, pages 170-178. ACM, 1989. URL: https://doi.org/10.1145/73141.74833.
  34. Elizabeth Scott and Adrian Johnstone. Right nulled GLR parsers. ACM Trans. Program. Lang. Syst., 28(4):577-618, July 2006. Google Scholar
  35. Elizabeth Scott and Adrian Johnstone. GLL parsing. ENTCS, 253(7):177-189, 2010. Proceedings of the Ninth Workshop on Language Descriptions Tools and Applications (LDTA 2009). Google Scholar
  36. Thomas A. Sudkamp. Languages and Machines: An Introduction to the Theory of Computer Science. Addison-Wesley Longman Publishing Co., Inc., USA, 1997. Google Scholar
  37. M. Tomita. Efficient Parsing for Natural Languages. A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, 1985. Google Scholar
  38. Mark van den Brand, Jan Heering, Paul Klint, and Pieter A. Olivier. Compiling language definitions: The ASF+SDF compiler. CoRR, cs.PL/0007008, 2000. URL: https://arxiv.org/abs/cs/0007008.
  39. Mark van den Brand, Jeroen Scheerder, Jurgen J. Vinju, and Eelco Visser. Disambiguation filters for scannerless generalized LR parsers. In R. Nigel Horspool, editor, Compiler Construction, 11th International Conference, CC 2002, volume 2304 of LNCS, pages 143-158. Springer, 2002. Google Scholar
  40. Mark G. J. van den Brand. PREGMATIC - A generator for incremental programming environments. PhD thesis, Radboud University Nijmegen, 1992. Google Scholar
  41. Mark G. J. van den Brand, Pierre-Etienne Moreau, and Christophe Ringeissen. The ELAN Environment: an Rewriting Logic Environment based on ASF+SDF Technology. In Workshop on Language Descriptions, Tools and Applications - LDTA'02, volume 65/3 of Electronic Notes in Theoretical Computer Science, Grenoble, France, April 2002. Colloque avec actes et comité de lecture. internationale. URL: https://hal.inria.fr/inria-00101028.
  42. J.J. Vinju. Analysis and Transformation of Source Code by Parsing and Rewriting. PhD thesis, Universiteit van Amsterdam, November 2005. Google Scholar
  43. Jurgen J. Vinju. SDF disambiguation medkit for programming languages. Technical Report SEN-1107, Centrum Wiskunde & Informatica, 2011. URL: http://oai.cwi.nl/oai/asset/18080/18080D.pdf.
  44. Eelco Visser. From context-free grammars with priorities to character class grammars. In Mieke Brune Arie van Deursen and Jan Heering, editors, Dat Is Dus Heel Interessant, Liber Amicorum dedicated to Paul Klint. Centrum Wiskunde & Informatica and IVI Universiteit van Amsterdam, 1997. Google Scholar
  45. Eelco Visser. Syntax Definition for Language Prototyping. PhD thesis, Universiteit van Amsterdam, 1997. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail