Blahut–Arimoto algorithm

Class of algorithms in information theory


title: "Blahut–Arimoto algorithm" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["coding-theory"] description: "Class of algorithms in information theory" topic_path: "general/coding-theory" source: "https://en.wikipedia.org/wiki/Blahut–Arimoto_algorithm" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Class of algorithms in information theory ::

The term Blahut–Arimoto algorithm is often used to refer to a class of algorithms for computing numerically either the information theoretic capacity of a channel, the rate-distortion function of a source or a source encoding (i.e. compression to remove the redundancy). They are iterative algorithms that eventually converge to one of the maxima of the optimization problem that is associated with these information theoretic concepts.

History and application

For the case of channel capacity, the algorithm was independently invented by Suguru Arimoto and Richard Blahut. In addition, Blahut's treatment gives algorithms for computing rate distortion and generalized capacity with input contraints (i.e. the capacity-cost function, analogous to rate-distortion). These algorithms are most applicable to the case of arbitrary finite alphabet sources. Much work has been done to extend it to more general problem instances.{{cite arXiv | eprint = 1012.5071v2 | author1 = Iddo Naiss | author2 = Haim Permuter | title = Extension of the Blahut-Arimoto algorithm for maximizing directed information | class = cs.IT | year = 2010}} Recently, a version of the algorithm that accounts for continuous and multivariate outputs was proposed with applications in cellular signaling. There exists also a version of Blahut–Arimoto algorithm for directed information.

Algorithm for Channel Capacity

A discrete memoryless channel (DMC) can be specified using two random variables X, Y with alphabet \mathcal X, \mathcal Y, and a channel law as a conditional probability distribution p(y|x). The channel capacity, defined as C := \sup_{p \sim X}I(X;Y), indicates the maximum efficiency that a channel can communicate, in the unit of bit per use. Now if we denote the cardinality |\mathcal X| = n, |\mathcal Y| = m, then p_{Y|X} is a n \times m matrix, which we denote the i^{th} row, j^{th} column entry by w_{ij}. For the case of channel capacity, the algorithm was independently invented by Suguru Arimoto and Richard Blahut. They both found the following expression for the capacity of a DMC with channel law:

C = \max_{\mathbf p} \max_{Q} \sum_{i=1}^n \sum_{j=1}^m p i w{ij}\log\left(\dfrac {Q_{ji}}{p_i}\right)

where \mathbf p and Q are maximized over the following requirements:

  • \mathbf p is a probability distribution on X, That is, if we write \mathbf p as (p_1, p_2 ... , p_n), \sum_{i= 1}^np_i = 1
  • Q = (q_{ji}) is a m \times n matrix that behaves like a transition matrix from Y to X with respect to the channel law. That is, For all 1 \leq i \leq n, 1\leq j \leq m:
    • q_{ji} \geq 0, q_{ji} = 0 \Leftrightarrow w_{ij} = 0
    • Every row sums up to 1, i.e. \sum_{i=1}^n q_{ji} = 1.

Then upon picking a random probability distribution \mathbf p^0 := (p^0_1, p^0_2, ... p^0_n) on X, we can generate a sequence (\mathbf p^0, Q^0, \mathbf p^1, Q^1 ...) iteratively as follows:

(q^t_{ji}):= \dfrac{p^t_iw_{ij}}{\sum_{k=1}^n p^t_kw_{kj}}

p^{t+1}k := \dfrac {\prod{j = 1}^m (q^t_{jk})^{w_{kj}}}{\sum_{i= 1}^n\prod_{j = 1}^m(q^t_{ji})^{w_{ij}}}

For t = 0, 1, 2 ....

Then, using the theory of optimization, specifically coordinate descent, Yeung showed that the sequence indeed converges to the required maximum. That is,

\lim_{t\to \infty} \sum_{i=1}^n \sum_{j=1}^m p^t_i w_{ij}\log\left(\dfrac {Q^t_{ji}}{p^t_i}\right) = C.

So given a channel law p(y|x), the capacity can be numerically estimated up to arbitrary precision.

Algorithm for Rate-Distortion

Suppose we have a source X with probability p(x) of any given symbol. We wish to find an encoding p(\hat{x}|x) that generates a compressed signal \hat{X} from the original signal while minimizing the expected distortion \langle d(x,\hat{x}) \rangle, where the expectation is taken over the joint probability of X and \hat{X}. We can find an encoding that minimizes the rate-distortion functional locally by repeating the following iteration until convergence:

:p_{t+1}(\hat{x}|x) = \frac{p_t(\hat{x}) \exp(-\beta d(x,\hat{x}))}{\sum_{\hat{x}} p_t(\hat{x}) \exp(-\beta d(x,\hat{x}))}

:p_{t+1}(\hat{x}) = \sum_{x} p(x)p_{t+1}(\hat{x}|x)

where \beta is a parameter related to the slope in the rate-distortion curve that we are targeting and thus is related to how much we favor compression versus distortion (higher \beta means less compression).

References

References

  1. Arimoto, Suguru. (1972). "An algorithm for computing the capacity of arbitrary discrete memoryless channels". [[IEEE Transactions on Information Theory]].
  2. Blahut, Richard. (1972). "Computation of channel capacity and rate-distortion functions". [[IEEE Transactions on Information Theory]].
  3. Vontobel, Pascal O.. (2003). "A Generalized Blahut–Arimoto Algorithm".
  4. (2019). "Information-theoretic analysis of multivariate single-cell signaling responses". [[PLOS Computational Biology]].
  5. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory.
  6. Cover, T. M.. (2006). "Elements of information theory". Wiley-Interscience.
  7. Arimoto, Suguru. (1972). "An algorithm for computing the capacity of arbitrary discrete memoryless channels". [[IEEE Transactions on Information Theory]].
  8. Blahut, Richard. (1972). "Computation of channel capacity and rate-distortion functions". [[IEEE Transactions on Information Theory]].
  9. Yeung, Raymond W.. (2008). "Information theory and network coding". Springer.

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

coding-theory