Scroll Top
19th Ave New York, NY 95822, USA

统计代写| Connections between Binomial and Hypergeometric stat代写

统计代写| Connections between Binomial and Hypergeometric stat代写

统计代考

$3.9$ Connections between Binomial and Hypergeometric
The Binomial and Hypergeometric distributions are connected in two important ways. As we will see in this section, we can get from the Binomial to the Hypergeometric by conditioning, and we can get from the Hypergeometric to the Binomial by taking a limit. We’ll start with a motivating example.
Example 3.9.1 (Fisher exact test). A scientist wishes to study whether women or
134 men are more likely to have a certain disease, or whether they are equally likely. A random sample of $n$ women and $m$ men is gathered, and each person is tested for the disease (assume for this problem that the test is completely accurate). The numbers of women and men in the sample who have the disease are $X$ and $Y$ respectively, with $X \sim \operatorname{Bin}\left(n, p_{1}\right)$ and $Y \sim \operatorname{Bin}\left(m, p_{2}\right)$, independently. Here $p_{1}$ and $p_{2}$ are unknown, and we are interested in testing whether $p_{1}=p_{2}$ (this is known as a null hypothesis in statistics).

Consider a $2 \times 2$ table with rows corresponding to disease status and columns corresponding to gender. Each entry is the count of how many people have that disease status and gender, so $n+m$ is the sum of all 4 entries. Suppose that it is observed that $X+Y=r$.

The Fisher exact test is based on conditioning on both the row and column sums, so $n, m, r$ are all treated as fixed, and then seeing if the observed value of $X$ is “extreme” compared to this conditional distribution. Assuming the null hypothesis, find the conditional PMF of $X$ given $X+Y=r$
Solution:
First we’ll build the $2 \times 2$ table, treating $n, m$, and $r$ as fixed.
\begin{tabular}{lcc|c}
\hline & Women & Men & Total \
\hline Disease & $x$ & $r-x$ & $r$ \
No disease & $n-x$ & $m-r+x$ & $n+m-r$ \
\hline Total & $n$ & $m$ & $n+m$ \
\hline
\end{tabular}
Next, let’s compute the conditional PMF $P(X=x \mid X+Y=r)$. By Bayes’ rule,
$$
\begin{aligned}
P(X=x \mid X+Y=r) &=\frac{P(X+Y=r \mid X=x) P(X=x)}{P(X+Y=r)} \
&=\frac{P(Y=r-x) P(X=x)}{P(X+Y=r)} .
\end{aligned}
$$
The step $P(X+Y=r \mid X=x)=P(Y=r-x)$ is justified by the independence of $X$ and $Y$. Assuming the null hypothesis and letting $p=p_{1}=p_{2}$, we have $X \sim \operatorname{Bin}(n, p)$ and $Y \sim \operatorname{Bin}(m, p)$, independently, so $X+Y \sim \operatorname{Bin}(n+m, p)$. Thus,
$$
P(X=x \mid X+Y=r)=\frac{\left(\begin{array}{c}
m \
r-x
\end{array}\right) p^{r-x}(1-p)^{m-r+x}\left(\begin{array}{c}
n \
x
\end{array}\right) p^{x}(1-p)^{n-x}}{\left(\begin{array}{c}
n+m \
r
\end{array}\right) p^{r}(1-p)^{n+m-r}}
$$
$$
=\frac{\left(\begin{array}{l}
n \
x
\end{array}\right)\left(\begin{array}{c}
m \
r-x
\end{array}\right)}{\left(n_{r}^{n+m}\right)} .
$$
So the conditional distribution of $X$ is Hypergeometric with parameters $n, m, r .$
To understand why the Hypergeometric appeared, seemingly out of nowhere, let’s connect this problem to the elk story for the Hypergeometric. In the elk story, we are
Random variables and their distributions
135 interested in the distribution of the number of tagged elk in the recaptured sample. By analogy, think of women as tagged elk and men as untagged elk. Instead of recapturing $r$ elk at random from the forest, we infect $X+Y=r$ people with the disease; under the null hypothesis, the set of diseased people is equally likely to be any set of $r$ people. Thus, conditional on $X+Y=r, X$ represents the number of of tagged elk in the recaptured sample, which is distributed HGeom( $n, m, r)$. of tagged elk in the recaptured sample, which is distributed HGeom(n, $m, r) .$

统计代写Independence of r.v.s

统计代考

$3.9$ 二项式和超几何之间的连接
二项分布和超几何分布以两种重要的方式联系起来。正如我们将在本节中看到的,我们可以通过条件化从二项式到超几何,也可以通过限制从超几何到二项式。我们将从一个鼓舞人心的例子开始。
示例 3.9.1(Fisher 精确检验)。一位科学家希望研究女性或
134 男性更有可能患有某种疾病,或者他们是否同样可能患有某种疾病。随机收集了 $n$ 个女性和 $m$ 个男性样本,并对每个人进行了疾病检测(假设该检测完全准确)。样本中女性和男性患病人数分别为 $X$ 和 $Y$,分别为 $X\sim\operatorname{Bin}\left(n, p_{1}\right)$ 和 $Y\ sim \operatorname{Bin}\left(m, p_{2}\right)$,独立。这里 $p_{1}$ 和 $p_{2}$ 是未知的,我们有兴趣检验 $p_{1}=p_{2}$ 是否(这在统计学中被称为零假设)。

考虑一个 $2\times 2$ 表,其中行对应于疾病状态,列对应于性别。每个条目是有多少人患有该疾病状态和性别的计数,因此 $n+m$ 是所有 4 个条目的总和。假设观察到$X+Y=r$。

Fisher 精确检验基于对行和列总和的条件化,因此 $n、m、r$ 都被视为固定值,然后查看 $X$ 的观察值与此条件分布相比是否“极端” .假设零假设,在给定 $X+Y=r$ 的情况下找到 $X$ 的条件 PMF
解决方案:
首先,我们将构建 $2 \times 2$ 表,将 $n、m$ 和 $r$ 视为固定的。
\开始{表格}{lcc|c}
\hline & 女性 & 男性 & 总计 \
\hline 疾病 & $x$ & $r-x$ & $r$ \
无病 & $n-x$ & $m-r+x$ & $n+m-r$ \
\hline 总计 & $n$ & $m$ & $n+m$ \
\hline
\end{表格}
接下来,让我们计算条件 PMF $P(X=x \mid X+Y=r)$。根据贝叶斯规则,
$$
\开始{对齐}
P(X=x \mid X+Y=r) &=\frac{P(X+Y=r \mid X=x) P(X=x)}{P(X+Y=r)} \
&=\frac{P(Y=r-x) P(X=x)}{P(X+Y=r)} 。
\end{对齐}
$$
步骤 $P(X+Y=r \mid X=x)=P(Y=r-x)$ 由 $X$ 和 $Y$ 的独立性来证明。假设零假设并让 $p=p_{1}=p_{2}$,我们有 $X \sim \operatorname{Bin}(n, p)$ 和 $Y \sim \operatorname{Bin}(m, p)$, 独立, 所以 $X+Y \sim \operatorname{Bin}(n+m, p)$。因此,
$$
P(X=x \mid X+Y=r)=\frac{\left(\begin{array}{c}
米\
r-x
\end{array}\right) p^{r-x}(1-p)^{m-r+x}\left(\begin{array}{c}
n \
X
\end{array}\right) p^{x}(1-p)^{n-x}}{\left(\begin{array}{c}
n+m \
r
\end{数组}\right) p^{r}(1-p)^{n+m-r}}
$$
$$
=\frac{\left(\begin{数组}{l}
n \
X
\end{array}\right)\left(\begin{array}{c}
米\
r-x
\end{array}\right)}{\left(n_{r}^{n+m}\right)} 。
$$
所以 $X$ 的条件分布是超几何的,参数为 $n, m, r .$
要理解为什么超几何会突然出现,让我们将这个问题与超几何的麋鹿故事联系起来。在麋鹿的故事中,我们是
随机变量及其分布
135 对重新捕获的样本中标记的麋鹿数量的分布感兴趣。以此类推,将女性视为标记的麋鹿,将男性视为未标记的麋鹿。我们不是从森林中随机夺回 $r$ 麋鹿,而是将 $X+Y=r$ 人感染这种疾病;在原假设下,一组患病的人同样可能是任何一组 $r$ 人。因此,以$X+Y=r为条件,X$表示重新捕获的样本中标记麋鹿的数量,分布为HGeom($n, m, r)$。重新捕获的样本中标记的麋鹿的数量,分布为 HGeom(n, $m, r) .$

R语言代写

统计代写|Why study probability? stat 代写

统计代写|SAMPLE SPACES AND PEBBLE WORLD stat 代写 请认准UprivateTA™. UprivateTA™为您的留学生涯保驾护航。

抽象代数Galois理论代写

统计作业代写

集合论数理逻辑代写案例

凸优化代写

统计exam代考

Related Posts

Leave a comment