Sardinas–Patterson algorithm


title: "Sardinas–Patterson algorithm" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["algorithms", "coding-theory", "data-compression"] topic_path: "technology/algorithms" source: "https://en.wikipedia.org/wiki/Sardinas–Patterson_algorithm" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

In coding theory, the Sardinas–Patterson algorithm is a classical algorithm for determining in polynomial time whether a given variable-length code is uniquely decodable, named after August Albert Sardinas and George W. Patterson, who published it in 1953. The algorithm carries out a systematic search for a string which admits two different decompositions into codewords. As Knuth reports, the algorithm was rediscovered about ten years later in 1963 by Floyd, despite the fact that it was at the time already well known in coding theory.

Idea of the algorithm

Consider the code {, \texttt{a} \mapsto \texttt{1}, \texttt{b} \mapsto \texttt{011}, \texttt{c}\mapsto \texttt{01110}, \texttt{d}\mapsto \texttt{1110}, \texttt{e}\mapsto \texttt{10011},}. This code, which is based on an example by Berstel, is an example of a code which is not uniquely decodable, since the string

:011101110011

can be interpreted as the sequence of codewords

:,

but also as the sequence of codewords

:.

Two possible decodings of this encoded string are thus given by and .

In general, a codeword can be found by the following idea: In the first round, we choose two codewords x_1 and y_1 such that x_1 is a prefix of y_1, that is, x_1w = y_1 for some "dangling suffix" w. If one tries first x_1=\texttt{011} and y_1=\texttt{01110}, the dangling suffix is \texttt{w} = \texttt{10}. If we manage to find two sequences x_2,\ldots,x_p and y_2,\ldots,y_q of codewords such that x_2\cdots x_p = wy_2\cdots y_q, then we are finished: For then the string x = x_1x_2\cdots x_p can alternatively be decomposed as y_1y_2\cdots y_q, and we have found the desired string having at least two different decompositions into codewords.

In the second round, we try out two different approaches: the first trial is to look for a codeword that has w as prefix. Then we obtain a new dangling suffix w, with which we can continue our search. If we eventually encounter a dangling suffix that is itself a codeword (or the empty word), then the search will terminate, as we know there exists a string with two decompositions. The second trial is to seek for a codeword that is itself a prefix of w. In our example, we have w = \texttt{10}, and the sequence is a codeword. We can thus also continue with w = \texttt{0} as the new dangling suffix.

Precise description of the algorithm

::data[format=table] | Ambiguity from result | 1 || 1 || 1 || 0 || 0 || 1 || 1 || | 1 || 1 || 1 || 0 || 0 || 1 || 1 || | |---|---|---| | d | b | | | a | F | b | | a | a | H | | a | a | e | ::

::data[format=table]

Example runC:S_1:S_2:
F=a-1d
G=a-1e
H=b-1c
H=a-1F
I=a-1H
J=I-1b
K=I-1c
::

The algorithm is described most conveniently using quotients of formal languages. In general, for two sets of strings D and N, the (left) quotient N^{-1}D is defined as the residual words obtained from D by removing some prefix in N. Formally, N^{-1}D = {,y \mid xy\in D \textrm{ and } x \in N ,}. Now let C denote the (finite) set of codewords in the given code.

The algorithm proceeds in rounds, where we maintain in each round not only one dangling suffix as described above, but the (finite) set of all potential dangling suffixes. Starting with round i=1, the set of potential dangling suffixes will be denoted by S_i. The sets S_i are defined inductively as follows:

S_1 = C^{-1}C \setminus {\varepsilon}. Here, the symbol \varepsilon denotes the empty word.

S_{i+1} = C^{-1}S_i\cup S_i^{-1}C, for all i\ge 1.

The algorithm computes the sets S_i in increasing order of i. As soon as one of the S_i contains a word from C or the empty word, then the algorithm terminates and answers that the given code is not uniquely decodable. Otherwise, once a set S_i equals a previously encountered set S_j with j, then the algorithm would enter in principle an endless loop. Instead of continuing endlessly, it answers that the given code is uniquely decodable.

See the left box for an example run of the algorithm on the given code; lower and upper case letters denote code and "dangling sugffix" strings, respectively. During the construction of S_2, the code word is encountered (shown in red), and the algorithm stops. The right box indicates how the example string 1110011 can be shown to have multiple encodings (, ), using the equations that were collected during the algorithm run.

Termination and correctness of the algorithm

Since all sets S_i are sets of suffixes of a finite set of codewords, there are only finitely many different candidates for S_i. Since visiting one of the sets for the second time will cause the algorithm to stop, the algorithm cannot continue endlessly and thus must always terminate. More precisely, the total number of dangling suffixes that the algorithm considers is at most equal to the total of the lengths of the codewords in the input, so the algorithm runs in polynomial time as a function of this input length. By using a suffix tree to speed the comparison between each dangling suffix and the codewords, the time for the algorithm can be bounded by O(nk), where n is the total length of the codewords and k is the number of codewords. The algorithm can be implemented using a pattern matching machine. The algorithm can also be implemented to run on a nondeterministic Turing machine that uses only logarithmic space; the problem of testing unique decipherability is NL-complete, so this space bound is optimal.

A proof that the algorithm is correct, i.e. that it always gives the correct answer, is found in the textbooks by Salomaa and by Berstel et al.

Notes

References

  • {{cite journal | last = Rodeh | first = M. | doi = 10.1109/TIT.1982.1056535 | issue = 4 | journal = IEEE Transactions on Information Theory | pages = 648–651 | title = A fast test for unique decipherability based on suffix trees (Corresp.) | volume = 28 | year = 1982}}.
  • {{cite journal | last1 = Apostolico | first1 = A. | last2 = Giancarlo | first2 = R. | volume = 18 | issue = 3 | journal = Information Processing Letters | pages = 155–158 | title = Pattern matching machine implementation of a fast test for unique decipherability | year = 1984 | doi=10.1016/0020-0190(84)90020-6}}.
  • {{cite journal | last = Rytter | first = Wojciech | doi = 10.1016/0020-0190(86)90121-3 | issue = 1 | journal = Information Processing Letters | mr = 853618 | pages = 1–3 | title = The space complexity of the unique decipherability problem | volume = 23 | year = 1986}}.
  • .

;Further reading

References

  1. Knuth (2003), p. 2
  2. Berstel et al. (2009), Example 2.3.1 p. 63
  3. {{harvtxt. Rytter. 1986 proves that the complementary problem, of testing for the existence of a string with two decodings, is NL-complete, and therefore that unique decipherability is co-NL-complete. The equivalence of NL-completeness and co-NL-completeness follows from the [[Immerman–Szelepcsényi theorem]].
  4. Salomaa (1981)
  5. Berstel et al. (2009), Chapter 2.3

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

algorithmscoding-theorydata-compression