ProbCons
Protein multiple-sequence alignment program
title: "ProbCons" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["computational-phylogenetics"] description: "Protein multiple-sequence alignment program" topic_path: "science/biology" source: "https://en.wikipedia.org/wiki/ProbCons" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0
::summary Protein multiple-sequence alignment program ::
In bioinformatics and proteomics, ProbCons is an open source software for probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT.
Algorithm
The following describes the basic outline of the ProbCons algorithm.
Step 1: Reliability of an alignment edge
For every pair of sequences compute the probability that letters x_i and y_i are paired in a^* an alignment that is generated by the model.
\begin{align} P(x_i \sim y_i|x,y) \ \overset{\underset{\mathrm{def}}{}}{=}& \ \Pr[x_i \sim y_i \text{ in some } a|x,y] \[8pt] =& \ \sum_{\text{alignment } a \atop {\text{with }x_i - y_i}} \Pr[a|x,y] \[2pt] =& \ \sum_{\text{alignment } a} \mathbf{1}{x_i - y_i \in a} \Pr[a|x,y] \end{align}
(Where \mathbf{1}{x_i \sim y_i \in a} is equal to 1 if x_i and y_i are in the alignment and 0 otherwise.)
Step 2: Maximum expected accuracy
The accuracy of an alignment a^* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence.
Calculate expected accuracy of each sequence:
\begin{align} E_{\Pr[a|x,y]}(\operatorname{acc}(a^,a)) & = \sum_{a}\Pr[a|x,y] \operatorname{acc}(a^,a) \ & = \frac{1}{\min(|x|,|y|)} \cdot \sum_{a}\mathbf{1}{x_i \sim y_i \in a} \Pr[a|x,y]\ & = \frac{1}{\min(|x|,|y|)} \cdot \sum_{x_i - y_i} P(x_i \sim y_j|x,y) \end{align}
This yields a maximum expected accuracy (MEA) alignment:
E(x,y) = \arg\max_{a^} ; E_{\Pr[a|x,y]}(\operatorname{acc}(a^,a))
Step 3: Probabilistic Consistency Transformation
All pairs of sequences x,y from the set of all sequences \mathcal{S} are now re-estimated using all intermediate sequences z:
P'(x_i - y_i|x,y) = \frac{1}{|\mathcal{S}|} \sum_{z} \sum_{1 \leq k \leq |z|} P(x_i \sim z_i|x,z) \cdot P(z_i \sim y_i|z,y)
This step can be iterated.
Step 4: Computation of guide tree
Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.
Step 5: Compute MSA
Finally compute the MSA using progressive alignment or iterative alignment.
References
References
- (2005). "PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment". Genome Research.
- Roshan, Usman. (2014-01-01). "Multiple Sequence Alignment Methods". Humana Press.
- [http://www.bioinf.uni-freiburg.de//Lehre/Courses/2011_WS/V_BioinfoII/slides_probcons.pdf Lecture "Bioinformatics II" at University of Freiburg]
::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::