Truncated binary encoding

Quality 0.50 · 6 views · Updated 2 months ago

title: "Truncated binary encoding" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["entropy-coding", "lossless-compression-algorithms"] topic_path: "technology/algorithms" source: "https://en.wikipedia.org/wiki/Truncated_binary_encoding" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

Truncated binary encoding is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n. It is a slightly more general form of binary encoding when n is not a power of two.

If n is a power of two, then the coded value for 0 ≤ x 2(n). Otherwise let k = floor(log2(n)), such that 2k k+1and let u = 2k+1 − n.

Truncated binary encoding assigns the first u symbols codewords of length k and then assigns the remaining n − u symbols the last n − u codewords of length k + 1. Because all the codewords of length k + 1 consist of an unassigned codeword of length k with a "0" or "1" appended, the resulting code is a prefix code.

History

Used since at least 1984, phase-in codes, also known as economy codes, are also known as truncated binary encoding.

Example with ''n'' = 5

For example, for the alphabet {0, 1, 2, 3, 4}, n = 5 and 22 ≤ n 3, hence k = 2 and u = 23 − 5 = 3. Truncated binary encoding assigns the first u symbols the codewords 00, 01, and 10, all of length 2, then assigns the last n − u symbols the codewords 110 and 111, the last two codewords of length 3.

For example, if n is 5, plain binary encoding and truncated binary encoding allocates the following codewords. Digits shown struck are not transmitted in truncated binary. ::data[format=table] | Truncated binary | Encoding | Standard binary | |---|---|---| | 0 | 0 | 0 | | 1 | 0 | 0 | | 2 | 0 | 1 | | UNUSED | 0 | 1 | | UNUSED | 1 | 0 | | UNUSED | 1 | 0 | | 3 | 1 | 1 | | 4 | 1 | 1 | ::

It takes 3 bits to encode n using straightforward binary encoding, hence 23 − n = 8 − 5 = 3 are unused.

In numerical terms, to send a value x, where 0 ≤ x k ≤ n k+1 symbols, there are u = 2k+1 − n unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number x in truncated binary is: if x is less than u, encode it in k binary bits; if x is greater than or equal to u, encode the value x + u in k + 1 binary bits.

Example with ''n'' = 10

Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 24 − 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.)

::data[format=table] | Input value | Offset | Offset value | Standard binary || Truncated binary | |---|---|---|---| | 0 | 0 | 0 | 0000 | | 1 | 0 | 1 | 0001 | | 2 | 0 | 2 | 0010 | | 3 | 0 | 3 | 0011 | | 4 | 0 | 4 | 0100 | | 5 | 0 | 5 | 0101 | | | | | | | 6 | 6 | 12 | 0110 | | 7 | 6 | 13 | 0111 | | 8 | 6 | 14 | 1000 | | 9 | 6 | 15 | 1001 | ::

To decode, read the first k bits. If they encode a value less than u, decoding is complete. Otherwise, read an additional bit and subtract u from the result.

Example with ''n'' = 7

Here is a more extreme case: with n = 7 the next power of 2 is 8, so k = 2 and u = 23 − 7 = 1: ::data[format=table] | Input value | Offset | Offset value | Standard binary || Truncated binary | |---|---|---|---| | 0 | 0 | 0 | 000 | | | | | | | 1 | 1 | 2 | 001 | | 2 | 1 | 3 | 010 | | 3 | 1 | 4 | 011 | | 4 | 1 | 5 | 100 | | 5 | 1 | 6 | 101 | | 6 | 1 | 7 | 110 | ::

This last example demonstrates that a leading zero bit does not always indicate a short code; if u k, some long codes will begin with a zero bit.

Simple algorithm

Generate the truncated binary encoding for a value x, 0 ≤ x 0 is the size of the alphabet containing x. n need not be a power of two. ::code[lang=C] string TruncatedBinary (int x, int n) { // Set k = floor(log2(n)), i.e., k such that 2^k <= n < 2^(k+1). int k = 0, t = n; while (t > 1) { k++; t >>= 1; }

// Set u to the number of unused codewords = 2^(k+1) - n.
int u = (1 << k + 1) - n;

if (x < u)
    return Binary(x, k); 
else
    return Binary(x + u, k + 1));

} ::

The routine Binary is expository; usually just the rightmost len bits of the variable x are desired. Here we simply output the binary code for x using len bits, padding with high-order 0s if necessary. ::code[lang=C] string Binary (int x, int len) { string s = ""; while (x != 0) { if (even(x)) s = '0' + s; else s = '1' + s;

	x >>= 1;
}
while (s.Length < len)
    s = '0' + s;
return s;

} ::

On efficiency

If n is not a power of two, and k-bit symbols are observed with probability p, then (k + 1)-bit symbols are observed with probability 1 − p. We can calculate the expected number of bits per symbol b_e as

: b_e = p k + (1 - p) (k + 1).

Raw encoding of the symbol has b_u = k + 1 bits. Then relative space saving s (see Data compression ratio) of the encoding can be defined as

: s = 1 - \frac{b_e}{b_u} = 1 - \frac{p k + (1 - p) (k + 1)}{k + 1}.

When simplified, this expression leads to

: s = \frac{p}{k + 1} = \frac{p}{b_u}.

This indicates that relative efficiency of truncated binary encoding increases as probability p of k-bit symbols increases, and the raw-encoding symbol bit-length b_u decreases.

References

Eastman, Willard L, ''et al.'' (Aug. 1984) [https://patentimages.storage.googleapis.com/86/7d/56/dd49314023387d/US4464650.pdf Apparatus and Method for Compressing Data Signals and Restoring the Compressed Data Signals], US Patent 4,464,650.
Acharya, Tinku et Já Já, Joseph F. (oct. 1996), [https://www.sciencedirect.com/science/article/abs/pii/0020025596000898 An on-line variable-length binary encoding of text], Information Sciences, vol 94 no 1-4, p. 1-22.
Job van der Zwan. [https://observablehq.com/@jobleonard/phase-in-codes "Phase-in Codes"].

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::