From Surf Wiki (app.surf) — the open knowledge base

Behrens–Fisher distribution

Probability distribution

In statistics, the Behrens–Fisher distribution, named after Ronald Fisher and Walter Behrens, is a parameterized family of probability distributions arising from the solution of the Behrens–Fisher problem proposed first by Behrens and several years later by Fisher. The Behrens–Fisher problem is that of statistical inference concerning the difference between the means of two normally distributed populations when the ratio of their variances is not known (and in particular, it is not known that their variances are equal).

Definition

The Behrens–Fisher distribution is the distribution of a random variable of the form

: T_2 \cos\theta - T_1\sin\theta ,

where T1 and T2 are independent random variables each with a Student's t-distribution, with respective degrees of freedom *ν*1 = n1 − 1 and *ν*2 = n2 − 1, and θ is a constant. Thus the family of Behrens–Fisher distributions is parametrized by *ν*1, *ν*2, and θ.

Derivation

Suppose it were known that the two population variances are equal, and samples of sizes n1 and n2 are taken from the two populations:

: \begin{align} X_{1,1},\ldots,X_{1,n_1} & \sim \operatorname{i.i.d.} N(\mu_1,\sigma^2), \[6pt] X_{2,1},\ldots,X_{2,n_2} & \sim \operatorname{i.i.d.} N(\mu_2,\sigma^2). \end{align}

where "i.i.d" are independent and identically distributed random variables and N denotes the normal distribution. The two sample means are

: \begin{align} \bar{X}1 & = (X{1,1}+\cdots+X_{1,n_1})/n_1 \[6pt] \bar{X}2 & = (X{2,1}+\cdots+X_{2,n_2})/n_2 \end{align}

The usual "pooled" unbiased estimate of the common variance *σ*2 is then

: S_\mathrm{pooled}^2 = \frac{\sum_{k=1}^{n_1}(X_{1,k}-\bar X_1)^2 + \sum_{k=1}^{n_2}(X_{2,k}-\bar X_2)^2}{n_1+n_2-2} = \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}

where S12 and S22 are the usual unbiased (Bessel-corrected) estimates of the two population variances.

Under these assumptions, the pivotal quantity

: \frac{(\mu_2-\mu_1)-(\bar X_2 - \bar X_1)}{\displaystyle\sqrt{\frac{S^2_\mathrm{pooled}}{n_1} + \frac{S^2_\mathrm{pooled}}{n_2} }}

has a t-distribution with n1 + n2 − 2 degrees of freedom. Accordingly, one can find a confidence interval for *μ*2 − *μ*1 whose endpoints are

: \bar{X}2 - \bar{X_1} \pm A \cdot S\mathrm{pooled} \sqrt{\frac{1}{n_1} +\frac{1}{n_2}},

where A is an appropriate quantile of the t-distribution.

However, in the Behrens–Fisher problem, the two population variances are not known to be equal, nor is their ratio known. Fisher considered the pivotal quantity

: \frac{(\mu_2-\mu_1)-(\bar X_2 - \bar X_1)}{\displaystyle\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2} }}.

This can be written as

: T_2\cos\theta - T_1\sin\theta, ,

where

: T_i = \frac{\mu_i - \bar{X}_i}{S_i/\sqrt{n_i}}\text{ for }i=1,2 ,

are the usual one-sample t-statistics and

: \tan\theta = \frac{S_1/\sqrt{n_1}}{S_2/\sqrt{n_2}}

and one takes θ to be in the first quadrant. The algebraic details are as follows:

: \begin{align} \frac{(\mu_2-\mu_1)-(\bar X_2 - \bar X_1)}{\displaystyle\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2} }} & = \frac{\mu_2-\bar{X}2}{\displaystyle\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2} }} - \frac{\mu_1-\bar{X}1}{\displaystyle\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2} }} \[10pt] & = \underbrace{\frac{\mu_2-\bar{X}2}{S_2/\sqrt{n_2}}}{\text{This is }T_2} \cdot \underbrace{\left( \frac{S_2/\sqrt{n_2}}{\displaystyle\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2} }} \right)}{\text{This is }\cos\theta} - \underbrace{\frac{\mu_1-\bar{X}1}{S_1/\sqrt{n_1}}}{\text{This is }T_1}\cdot\underbrace{\left( \frac{S_1/\sqrt{n_1}}{\displaystyle\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2} }} \right)}{\text{This is }\sin\theta}.\qquad\qquad\qquad (1) \end{align}

The fact that the sum of the squares of the expressions in parentheses above is 1 implies that they are the squared cosine and squared sine of some angle.

The Behren–Fisher distribution is actually the conditional distribution of the quantity (1) above, given the values of the quantities labeled cos θ and sin θ. In effect, Fisher conditions on ancillary information.

Fisher then found the "fiducial interval" whose endpoints are

: \bar{X}_2-\bar{X}_1 \pm A \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} }

where A is the appropriate percentage point of the Behrens–Fisher distribution. Fisher claimed that the probability that *μ*2 − *μ*1 is in this interval, given the data (ultimately the Xs) is the probability that a Behrens–Fisher-distributed random variable is between −A and A.

Fiducial intervals versus confidence intervals

Bartlett showed that this "fiducial interval" is not a confidence interval because it does not have a constant coverage rate. Fisher did not consider that a cogent objection to the use of the fiducial interval.

Fisher transformation
Fisher information --

References

(December 1998). "On the Behrens-Fisher Problem: A Review". Journal of Educational and Behavioral Statistics.

Info: Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

continuous-distributions

Want to explore this topic further?

Ask Mako anything about Behrens–Fisher distribution — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report