Parsing Tools Built using Parsing Theory

Many parsing tools, such as Flex & Bison and ANTLR, are build on solid theory. As a result, they were created quickly and cheaply using structured, well-known techniques. Contrast with parsers developed prior to the theory:

The earliest parser back in the 1950s used utterly ad hoc techniques to analyze the syntax of the source code of programs they were parsing.

Things changed in the 1960s and 70s:

During the 1960s, the field got a lot of academic attention and by the early 1970s, parsing was no longer an arcane art. In the 1970s Aho, Ullman, Knuth, and many others put parsing techniques solidly on their theoretical feet.

One of the first techniques they (Aho, Ullman, Knuth, and others) espoused was to separate lexing (a.k.a. scanning, tokenizing) from parsing. Lexing built upon regular expressions, which built upon Finite Automata (FA) theory and Nondeterministic Finite Automata (NFA) theory. FA and NFA were brilliantly melded together by the famous Kleene Theorem. Parsing built on top of a rich theory of grammars – Context Free Grammars, Context Sensitive Grammars, etc. – that Chomsky formulated. Here’s a graphic I created depicting the foundation upon which Flex & Bison, ANTLR, and other parser tools are built:

Theory underpinning parser tools

Sometimes theory can be pie-in-the-sky and without practical value, but in this case the theory is enormously practical - the theory informs developers how to implement lexers and parsers in a systematic fashion. As a result, even second year CompSci students can develop lexers and parsers quickly and cheaply and the resulting tools are fast and bug-free. Without the theory everyone would be creating ad-hoc tools like they did in the 1950’s.

Above I stated that the parsing theory was developed in the 1960's and 1970s. Have the structures to be parsed - programming language files and data format files - evolved in complexity to the point where that parsing theory is no longer sufficient? Do we need more advanced parsing theory? No! The parsing theory developed in the 1960s and 1970s is still light-years ahead of modern structures. I created a graphic to depict this:

Parsing capability versus parsing needs

A key property of a great software tool

Recently I have been learning to build parsers using a tool called Flex & Bison. To create a parser one specifies a list of rules and actions (i.e., a grammar). My parser was failing and despite my best debugging efforts I couldn't figure out what I had done wrong. Then I discovered that Flex & Bison can show the steps (shift/reduce steps) it takes to parse the input using my grammar. The output Flex & Bison displayed was fantastic - simple, easy to interpret, and contained exactly the information that I needed. With that output I was able to quickly determine the problem in my grammar and fix it.

Since that experience I have convinced myself that a key property of a great tool is that the tool must provide a way to display what processing it does to process the input using my specification (e.g., using my grammar). In one of my projects I am writing a software tool and during the course of developing the tool I have learned that it is very difficult to attain the property that I just mentioned. It is very difficult to find just the right set of information that is not too much and not too little and which enables quick, effective debugging. So, to whoever developed Flex & Bison -- my hat's off to you, I am in awe, you rock!

Last Updated: September 18, 2021. Author: Roger Costello