# Hierarchical models

## following situations

Many important statistical models have a hierarchical structure: not all random variables in the model are defined simultaneously, but instead there are several ‘levels’ of randomness and the distribution of the random variables in the later levels depends on the values of random variables in earlier levels. This structure is, for example, present in the following situations.

• In Bayesian models (discussed in Section 4.3) the distribution of the data depends on the value of one or more random parameters.
• In mixture models the distribution of samples depends on the random choice of mixture component.
• In Markov chains (discussed in Section 2.3) the distribution of the value at time $t$ depends on the value of the Markov chain at time $t-1$.

Simulating hierarchical models is often easy: the simulation procedure will be performed in steps, closely following the structure of the model. We illustrate this approach here using examples.

## hierarchical structure

Consider the Bayesian model where the data are described as i.i.d. samples $X_{1}, \ldots, X_{n} \sim \mathcal{N}\left(\mu, \sigma^{2}\right)$, and where the mean $\mu$ and the variance $\sigma^{2}$ are themselves assumed to be random with distributions $\sigma^{2} \sim \operatorname{Exp}(\lambda)$ and $\mu \sim$ $\mathcal{N}\left(\mu_{0}, \alpha \sigma^{2}\right)$. Since the variance $\sigma^{2}$ occurs in the distribution of $\mu$, the model has the following dependence structure:
$$\sigma^{2} \longrightarrow \mu \quad X_{1}, \ldots, X_{n}$$
To generate samples from this model, we use steps corresponding to the levels in the model:
1: generate $\sigma^{2} \sim \operatorname{Exp}(\lambda)$
2: generate $\mu \sim \mathcal{N}\left(\mu_{0}, \alpha \sigma^{2}\right)$
3: for $i=1, \ldots, n$ do
4: generate $X_{i} \sim \mathcal{N}\left(\mu, \sigma^{2}\right)$
5 : end for
Sometimes, the hierarchical structure of a model is not immediately clear. This is for example the case for mixture distributions as given in the following definition, but we will see that for generating samples from a mixture distribution it is beneficial to introduce a hierarchical structure.

## IMPORTANT MULTIVARIATE

F(X)=1(2圆周率)d/2|这⁡Σ|1/2经验⁡(−12(X−μ)⊤Σ−1(X−μ))

## USING THIS INTERPRETATION

$$(x-\mu)^{\top} \Sigma^{-1}(x-\mu)=\sum_{i, j=1}^{d}\left(x_{i}-\mu_{i}\right)\left(\Sigma^{-1}\right){i j}\left(x{j}-\mu_{j}\right) .$$
The multivariate normal distribution from definition $2.1$ is a generalisation of the one-dimensional normal distribution: If $\Sigma$ is a diagonal matrix, say
$$\Sigma=\left(\begin{array}{cccc} \sigma_{1}^{2} & 0 & \ldots & 0 \ 0 & \sigma_{2}^{2} & \ldots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \ldots & \sigma_{d}^{2} \end{array}\right)$$
then $|\operatorname{det} \Sigma|=\prod_{i=1}^{d} \sigma_{i}^{2}$ and
$$\Sigma^{-1}=\left(\begin{array}{cccc} 1 / \sigma_{1}^{2} & 0 & \cdots & 0 \ 0 & 1 / \sigma_{2}^{2} & \cdots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \cdots & 1 / \sigma_{d}^{2} \end{array}\right)$$
and thus the density $f$ from (2.1) can be written as
\begin{aligned} f(x) &=\frac{1}{(2 \pi)^{d / 2}\left|\prod_{i=1}^{d} \sigma_{i}^{2}\right|^{1 / 2}} \exp \left(-\frac{1}{2} \sum_{i=1}^{d}\left(x_{i}-\mu_{i}\right) \frac{1}{\sigma_{i}^{2}}\left(x_{i}-\mu_{i}\right)\right) \ &=\prod_{i=1}^{d} \frac{1}{\left(2 \pi \sigma_{i}^{2}\right)^{1 / 2}} \exp \left(-\frac{\left(x_{i}-\mu_{i}\right)^{2}}{2 \sigma_{i}^{2}}\right) \ &=\prod_{i=1}^{d} f_{i}\left(x_{i}\right) \end{aligned}
where the function $f_{i}$, given by
$$f_{i}(x)=\frac{1}{\left(2 \pi \sigma_{i}^{2}\right)^{1 / 2}} \exp \left(-\frac{\left(x-\mu_{i}\right)^{2}}{2 \sigma_{i}^{2}}\right)$$