Packrat parser

Type of parser


title: "Packrat parser" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["parsing-algorithms", "dynamic-programming"] description: "Type of parser" topic_path: "technology/algorithms" source: "https://en.wikipedia.org/wiki/Packrat_parser" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Type of parser ::

::data[format=table title="Infobox algorithm"]

FieldValue
namePackrat parser
classParsing grammars that are PEG
dataString
timeO(n) or O(n^2) without special handling of iterative combinator
best-time{{plainlist
average-timeO(n)
spaceO(n)
::

|name=Packrat parser |class=Parsing grammars that are PEG |data=String |time=O(n) or O(n^2) without special handling of iterative combinator |best-time={{plainlist|

  • O(n) |average-time=O(n) |space=O(n)

The Packrat parser is a type of parser that shares similarities with the recursive descent parser in its construction. However, it differs because it takes parsing expression grammars (PEGs) as input rather than LL grammars.

In 1970, Alexander Birman laid the groundwork for packrat parsing by introducing the "TMG recognition scheme" (TS), and "generalized TS" (gTS). TS was based upon Robert M. McClure's TMG compiler-compiler, and gTS was based upon Dewey Val Schorre's META compiler-compiler. Birman's work was later refined by Aho and Ullman; and renamed as Top-Down Parsing Language (TDPL), and Generalized TDPL (GTDPL), respectively. These algorithms were the first of their kind to employ deterministic top-down parsing with backtracking.

Bryan Ford developed PEGs as an expansion of GTDPL and TS. Unlike CFGs, PEGs are unambiguous and can match well with machine-oriented languages. PEGs, similar to GTDPL and TS, can also express all LL(k) and LR(k). Ford also introduced Packrat as a parser that uses memoization techniques on top of a simple PEG parser. This was done because PEGs have an unlimited lookahead capability resulting in a parser with exponential time performance in the worst case.

Packrat keeps track of the intermediate results for all mutually recursive parsing functions. Each parsing function is only called once at a specific input position. In some instances of packrat implementation, if there is insufficient memory, certain parsing functions may need to be called multiple times at the same input position, causing the parser to take longer than linear time.

Syntax

The packrat parser takes in input the same syntax as PEGs: a simple PEG is composed of terminal and nonterminal symbols, possibly interleaved with operators that compose one or several derivation rules.

Symbols

  • Nonterminal symbols are indicated with capital letters (e.g., {S, E, F, D})
  • Terminal symbols are indicated with lowercase (e.g., {a,b,z,e,g })
  • Expressions are indicated with lower-case Greek letter (e.g., {\alpha,\beta,\gamma,\omega,\tau})
    • Expressions can be a mix of terminal symbols, nonterminal symbols and operators

Operators

::data[format=table title="Syntax Rules"]

OperatorSemantics
SequenceSuccess: If \alpha and \beta are recognized
Ordered choiceSuccess: If any of {\alpha,\beta,\gamma} is recognized starting from the left
And predicateSuccess: If \alpha is recognized
Not predicateSuccess: If \alpha is not recognized
One or moreSuccess: Try to recognize \alpha one or multiple time
Zero or moreSuccess: Try to recognize \alpha zero or multiple time
Zero or oneSuccess: Try to recognize \alpha zero or once
Terminal rangeSuccess: Recognize any terminal c that are inside the range [a-b]. In the case of [\textbf{'} h \textbf{'} - \textbf{'} z \textbf{'}] , c can be any letter from h to z
Any characterSuccess: Recognize any character in the input
::

Rules

A derivation rule is composed by a nonterminal symbol and an expression S \rightarrow \alpha.

A special expression \alpha_s is the starting point of the grammar. In case no \alpha_s is specified, the first expression of the first rule is used.

An input string is considered accepted by the parser if the \alpha_s is recognized. As a side-effect, a string x can be recognized by the parser even if it was not fully consumed.

An extreme case of this rule is that the grammar S \rightarrow x* matches any string.

This can be avoided by rewriting the grammar as S \rightarrow x*!.

Example

\begin{cases} S \rightarrow A/B/D \ A \rightarrow \texttt{'a'}\ S \ \texttt{'a'} \ B \rightarrow \texttt{'b'}\ S \ \texttt{'b'} \ D \rightarrow (\texttt{'0'}-\texttt{'9'})? \end{cases}

This grammar recognizes a palindrome over the alphabet { a,b } , with an optional digit in the middle.

Example strings accepted by the grammar include: \texttt{'aa'} and \texttt{'aba3aba'} .

Left recursion

Left recursion happens when a grammar production refers to itself as its left-most element, either directly or indirectly. Since Packrat is a recursive descent parser, it cannot handle left recursion directly. During the early stages of development, it was found that a production that is left-recursive can be transformed into a right-recursive production. This modification significantly simplifies the task of a Packrat parser. Nonetheless, if there is an indirect left recursion involved, the process of rewriting can be quite complex and challenging. If the time complexity requirements are loosened from linear to superlinear, it is possible to modify the memoization table of a Packrat parser to permit left recursion, without altering the input grammar.

Iterative combinator

The iterative combinators \alpha + and \alpha * need special attention when used in a Packrat parser: these combinators introduce a secret recursion that does not record intermediate results in the outcome matrix, which can lead to the parser operating with a superlinear behaviour. This problem can be resolved by applying the following transformation: ::data[format=table title=""]

OriginalTranslated
S \rightarrow \alpha +S \rightarrow \alpha S / \alpha
S \rightarrow \alpha *S \rightarrow \alpha S / \epsilon
::

With this transformation, the intermediate results can be properly memoized.

Memoization technique

Memoization is an optimization technique in computing that aims to speed up programs by storing the results of expensive function calls. This technique essentially works by caching the results so that when the same inputs occur again, the cached result is simply returned, thus avoiding the time-consuming process of re-computing. When using packrat parsing and memoization, it's noteworthy that the parsing function for each nonterminal is solely based on the input string. It does not depend on any information gathered during the parsing process. Essentially, memoization table entries do not affect or rely on the parser's specific state at any given time.

Packrat parsing stores results in a matrix or similar data structure that allows for quick look-ups and insertions. When a production is encountered, the matrix is checked to see if it has already occurred. If it has, the result is retrieved from the matrix. If not, the production is evaluated, the result is inserted into the matrix, and then returned. When evaluating the entire m*n matrix in a tabular approach, it would require \Theta(mn) space. Here, m represents the number of nonterminals, and n represents the input string size.

In a naïve implementation, the entire table can be derived from the input string starting from the end of the string.

The Packrat parser can be improved to update only the necessary cells in the matrix through a depth-first visit of each subexpression tree. Consequently, using a matrix with dimensions of m*n is often wasteful, as most entries would remain empty. These cells are linked to the input string, not to the nonterminals of the grammar. This means that increasing the input string size would always increase memory consumption, while the number of parsing rules changes only the worst space complexity.

Cut operator

Another operator called cut has been introduced to Packrat to reduce its average space complexity even further. This operator utilizes the formal structures of many programming languages to eliminate impossible derivations. For instance, control statements parsing in a standard programming language is mutually exclusive from the first recognized token, e.g.,{\mathtt{if, do, while, switch}}. ::data[format=table]

OperatorSemantics
Cutif \alpha
::

When a Packrat parser uses cut operators, it effectively clears its backtracking stack. This is because a cut operator reduces the number of possible alternatives in an ordered choice. By adding cut operators in the right places in a grammar's definition, the resulting Packrat parser only needs a nearly constant amount of space for memoization. However, the fundamental problem that packrat parsers require O(n) space has not been resolved.

The algorithm

Sketch of an implementation of a Packrat algorithm in a Lua-like pseudocode. ::code[lang=lua] INPUT(n) -- return the character at position n

RULE(R : Rule, P : Position ) entry = GET_MEMO(R,P) -- return the number of elements previously matched in rule R at position P if entry == nil then return EVAL(R, P); end return entry;

EVAL(R : Rule, P : Position ) start = P;
for choice in R.choices -- Return a list of choice acc=0; for symbol in choice then -- Return each element of a rule, terminal and nonterminal if symbol.is_terminal then if INPUT(start+acc) == symbol.terminal then acc = acc + 1; --Found correct terminal skip pass it else break;
end else res = RULE(symbol.nonterminal , start+acc ); -- try to recognize a nonterminal in position start+acc SET_MEMO(symbol.nonterminal , start+acc, res ); -- we memoize also the failure with special value fail if res == fail then
break; end acc = acc + res; end if symbol == choice.last -- check if we have matched the last symbol in a choice if so return return acc; end end return fail; --if no choice match return fail ::

Example

Given the following context, a free grammar that recognizes simple arithmetic expressions composed of single digits interleaved by sum, multiplication, and parenthesis.

\begin{cases} S \rightarrow A \ A \rightarrow M\ \texttt{'+'}\ A \ / \ M \ M \rightarrow P\ \texttt{'*'}\ M \ / \ P \ P \rightarrow \texttt{'('}\ A\ \texttt{')'}\ / \ D \ D \rightarrow (\texttt{'0'}-\texttt{'9'}) \end{cases}

Denoted with ⊣ the line terminator we can apply the packrat algorithm

::data[format=table]

Derivation RulesInput shiftedNotesInput left
\begin{array}{l}ɛ
Input doesn't match the first element in the derivation.
::

| ::data[format=table title=""]

Index1234567S
A
M
P
D
2*(3+4)
::

No update because no terminal was recognized |- |[[File:Second_step_in_Parsing_a_CFG_with_packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
Shift input by one after deriving terminal
::

| ::data[format=table]

Index1234567S
A
M
P1
D1
2*(3+4)
::

Update:

D(1) = 1;

P(1) = 1; |- |[[File:Third_step_of_recognizing_CFG_with_packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
M \rightarrow P\ \texttt{'*'}\ M
Shift input by two terminal
::

| ::data[format=table]

Index1234567S
A
M
P1
D1
2*(3+4)
::

No update because no nonterminal was fully recognized |- |[[File:Fourth_step_in_recognizing_CFG_grammar_with_Packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
A \rightarrow M\ \texttt{'+'}\ A
M \rightarrow P\ \texttt{'*'}\ M
P \rightarrow \texttt{'('}\ A\ \texttt{')'}
Input doesn't match the first element in the derivation.
::

| ::data[format=table]

Index1234567S
A
M
P1
D1
2*(3+4)
::

No update because no terminal was recognized |- |[[File:5th step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
P \rightarrow D
D \rightarrow 3
Shift input by one after deriving terminal
::

| ::data[format=table]

Index1234567S
A
M
P11
D11
2*(3+4)
::

Update:

D(4) = 1;

P(4) = 1; |- |[[File:Sixth step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
M \rightarrow P
Roll Back to
::

| ::data[format=table]

Index1234567S
A
M1
P11
D11
2*(3+4)
::

Hit on P(4)

Update M(4) = 1 as M was recognized |- |[[File:Seventh step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
A \rightarrow M\ \texttt{'+'}\ A
M \rightarrow P\ \texttt{'*'}\ M
P \rightarrow \texttt{'('}\ A\ \texttt{')'}
Input doesn't match the first element in the derivation.
::

| ::data[format=table]

Index1234567S
A
M1
P11
D11
2*(3+4)
::

No update because no terminal was recognized |- |[[File:Eighth step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
P \rightarrow D
D \rightarrow 4
Shift input by one after deriving terminal
::

| ::data[format=table]

Index1234567S
A
M1
P111
D111
2*(3+4)
::

Update:

D(6) = 1;

P(6) = 1; |- |[[File:Ninth step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
M \rightarrow P
Roll Back to
::

| ::data[format=table]

Index1234567S
A
M11
P111
D111
2*(3+4)
::

Hit on P(6)

Update M(6) = 1 as M was recognized |- |[[File:Tenth step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
A \rightarrow M
Roll Back to A \rightarrow M\ \texttt{'+'}\ A \ / \ \underline{M}
::

| ::data[format=table]

Index1234567S
A3
M11
P1511
D111
2*(3+4)
::

Hit on M(6)

Update A(4) = 3 as A was recognized

Update P(3)=5 as P was recognized |- |[[File:Eleventh step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
Roll Back to
::

| ::data[format=table]

Index1234567S
A3
M11
P1511
D111
2*(3+4)
::

No update because no terminal was recognized |- |[[File:Twelfth step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
M \rightarrow P
we don't expand it as we have a hit in the memoization table P(3) ≠ 0, so shift the input by P(3).
::

| ::data[format=table]

Index1234567S
A3
M711
P1511
D111
2*(3+4)
::

Hit on P(3)

Update M(1)=7 as M was recognized |- |[[File:13th step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
Roll Back to A \rightarrow M\ \texttt{'+'}\ A \ / \ \underline{M} as terminal + \neq \dashv
::

| ::data[format=table]

Index1234567S
A3
M711
P1511
D111
2*(3+4)
::

No update because no terminal was recognized |- |[[File:14th step of recognizing CFG with packrat.svg|frameless|325x325px]] | ::data[format=table]

Derivation RulesInput shiftedNotesInput left
A \rightarrow M
We don't expand it as we have a hit in the memoization table M(1) ≠ 0, so shift the input by M(1).
::

| ::data[format=table]

Index1234567S7
A73
M711
P1511
D111
2*(3+4)
::

Hit on M(1)

Update A(1)=7 as A was recognized

Update S(1)=7 as S was recognized |}

Implementation

::data[format=table]

NameParsing algorithmOutput languagesGrammar, codeDevelopment platformLicense
AustenXPackrat (modified)Java, BSD
AurochsPackratC, OCaml, Java, GNU GPL
CanopyPackratJava, JavaScript, Python, Ruby, GNU GPL
CL-pegPackratCommon Lisp, MIT
Drat!PackratD, GNU GPL
FrisbyPackratHaskell, BSD
grammar::pegPackratTcl, BSD
IronMetaPackratC#, BSD
PEGParserPackrat (supporting left-recursion and grammar ambiguity)C++Identical, BSD
NarwhalPackratC, BSD
neotomaPackratErlang, MIT
OMetaPackrat (modified, partial memoization)JavaScript, Squeak, Python, MIT
PackCCPackrat (modified, left-recursion support)C, MIT
PackratPackratScheme, MIT
PappyPackratHaskell, BSD
ParsnipPackratC++, GNU GPL
PEG.jsPackrat (partial memoization)JavaScript, MIT
PeggyPackrat (partial memoization)JavaScript, MIT
PegasusRecursive descent, Packrat (selectively)C#, MIT
PetitParserPackratSmalltalk, Java, Dart, MIT
PyPy rlibPackratPython, MIT
Rats!PackratJava, GNU LGPL
go-packratPackratGoIdenticalAll, GPLv3
::

References

References

  1. Ford, Bryan. (2006). "Packrat Parsing: Simple, Powerful, Lazy, Linear Time".
  2. Ford, Bryan. (2004-01-01). "Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages". Association for Computing Machinery.
  3. Flodin, Daniel. "A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs".
  4. (2010-05-06). "Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering". ACM.
  5. (2008-01-07). "Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation". Association for Computing Machinery.
  6. (2007). "Compilers: principles, techniques, & tools". Pearson Addison-Wesley.
  7. Norvig, Peter. (1991-03-01). "Techniques for automatic memoization with applications to context-free parsing". Computational Linguistics.
  8. (2017-10-23). "Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering". Association for Computing Machinery.
  9. "A Survey of Packrat Parser". A Survey of Packrat Parser.
  10. (2010-05-06). "Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering". Association for Computing Machinery.
  11. Maintained fork of PEG.js

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

parsing-algorithmsdynamic-programming