Scroll Top
19th Ave New York, NY 95822, USA

统计代写|STAT501 Regression Methods

MY-ASSIGNMENTEXPERT™可以为您提供 stat.psu STAT501 Regression Methods线性回归的代写代考辅导服务!

这是埃伯利理学院 线性回归课程的代写成功案例。

统计代写|STAT501 Regression Methods

STAT501课程简介

STAT 501 is an applied linear regression course that emphasizes data analysis and interpretation. Generally, statistical regression is collection of methods for determining and using models that explain how a response variable dependent variable relates to one or more explanatory variables (predictor variables).

Course Author(s)
Dr. Laura Simon and Dr. Robert Heckard were the primary authors of the materials for this course. Dr. Andrew Wiesner and Dr. Derek Young have also authored course materials for STAT 501. Most recently Dr. Iain Pardoe has been instrumental in updating the course materials.

Prerequisites 

This graduate level course covers the following topics:

Understanding the context for simple linear regression.
How to evaluate simple linear regression models
How a simple linear regression model is used to estimate and predict likely values
Understanding the assumptions that need to be met for a simple linear regression model to be valid
How multiple predictors can be included into a regression model
Understanding the assumptions that need to be met when multiple predictors are included in the regression model for the model to be valid
How a multiple linear regression model is used to estimate and predict likely values
Understanding how categorical predictors can be included into a regression model
How to transform data in order to deal with problems identified in the regression model
Strategies for building regression models
Distinguishing between outliers and influential data points and how to deal with these
Handling problems typically encountered in regression contexts
Alternative methods for estimating a regression line besides using ordinary least squares
Understanding regression models in time dependent contexts
Understanding regression models in non-linear contexts

STAT501 Regression Methods HELP(EXAM HELP, ONLINE TUTOR)

问题 1.

Non-Uniformly Weighted Data 7pts
Consider a data set in which each data point $t_n$ is associated with a weighting factor $r_n>0$, so that the sum-of-squares error function becomes
$$
E_D(\boldsymbol{w})=\frac{1}{2} \sum_{n=1}^N r_n\left{t_n-\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_n\right)\right}^2 .
$$
Find an expression for the solution $w^*$ that minimizes this error function.

问题 2.

Priors and Regularization 7pts
Consider the Bayesian linear regression model given in Bishop 3.3.1. The prior is
$$
p(\boldsymbol{w} \mid \alpha)=\mathcal{N}\left(\boldsymbol{w} \mid \mathbf{0}, \alpha^{-1} \boldsymbol{I}\right)
$$
where $\alpha$ is the precision parameter that controls the variance of the Gaussian prior. The likelihood can be written as
$$
p(\boldsymbol{t} \mid \boldsymbol{w})=\prod_{n=1}^N \mathcal{N}\left(t_n \mid \boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}n\right), \beta^{-1}\right), $$ Using the fact that the posterior is the product of the prior and the likelihood (up to a normalization constant), show that maximizing the $\log$ posterior (i.e., $\ln p(w \mid t)=$ $\ln p(\boldsymbol{w} \mid \alpha)+\ln p(\boldsymbol{t} \mid \boldsymbol{w}))$ is equivalent to minimizing the regularized error term given by $E_D(w)+\lambda E_W(w)$ with $$ \begin{aligned} & E_D(\boldsymbol{w})=\frac{1}{2} \sum{n=1}^N\left(t_n-\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_n\right)\right)^2 \
& E_W(\boldsymbol{w})=\frac{1}{2} \boldsymbol{w}^{\top} \boldsymbol{w}
\end{aligned}
$$
Do this by writing $\ln p(\boldsymbol{w} \mid \boldsymbol{t})$ as a function of $E_D(\boldsymbol{w})$ and $E_W(\boldsymbol{w})$, dropping constant terms if necessary. Conclude that maximizing this posterior is equivalent to minimizing the regularized error term given by $E_D(w)+\lambda E_W(w)$. Hint: take $\lambda=\alpha / \beta$

问题 3.

1 Linear Regression Leman, 20 points
Assume that there are $n$ given training examples $(X 1, Y 1),(X 2, Y 2), \ldots,\left(X_n, Y_n\right)$, where each input data point $X_i$ has $m$ real valued features. The goal of regression is to learn to predict $Y$ from $X$.

The linear regression model assumes that the output $\mathrm{Y}$ is a linear combination of the input features $X$ plus noise terms $\epsilon$ from a given distribution with weights given by $\beta$.

We can write this in matrix form by stacking the datapoints as the rows of a matrix $X$ so that $x_{i j}$ is the $j$-th feature of the $i$-th datapoint. Then writing $Y, \beta$ and $\epsilon$ as column vectors, we can write the matrix form of the linear regression model as:
$$
Y=X \beta+\epsilon
$$
where
$$
\mathbf{Y}=\left[\begin{array}{c}
Y_1 \
\vdots \
Y_n
\end{array}\right], \epsilon=\left[\begin{array}{c}
\epsilon_1 \
\vdots \
\epsilon_n
\end{array}\right], \beta=\left[\begin{array}{c}
\beta_1 \
\vdots \
\beta_m
\end{array}\right], \text { and } X=\left[\begin{array}{cccc}
X_{11} & X_{12} & \ldots & X_{1 m} \
X_{21} & X_{22} & \ldots & X_{2 m} \
\vdots & \vdots & \ddots & \vdots \
X_{n 1} & X_{n 2} & \ldots & X_{n m}
\end{array}\right]
$$
Linear regression seeks to find the parameter vector $\beta$ that provides the best fit of the above regression model. One criteria to measure fitness, is to find $\beta$ that minimizes a given loss function $J(\beta)$.

In class, we have shown that if we take the loss function to be the square-error, i.e.:
$$
J(\beta)=\sum_i\left(Y_i-X_i^T \beta\right)^2=(X \beta-Y)^T(X \beta-Y)
$$
Then
$$
\hat{\beta}=\left(X^T X\right)^{-1} X^T Y
$$
Moreover, we have also shown that if we assume that $\epsilon_1 ; \ldots ; \epsilon_N$ are IID and sampled from the same zero mean Gaussian that is, $\epsilon_i \sim \mathcal{N}\left(0, \sigma^2\right)$, then the least square estimate is also the MLE estimate for $P(Y \mid X ; \beta)$
Now, let $\hat{Y}$ denote the vector of predictions using $\hat{\beta}$. If we were to plug in the original training set $\mathrm{X}$ :
$$
\hat{Y}=X \hat{\beta}=X\left(X^T X\right)^{-1} X^T Y
$$
As mentioned above, $\hat{\beta}$, also minimizes the sum of squared errors:
$$
S S E=\sum_{i=1}^n\left(Y_i-\hat{Y}_i\right)^2
$$

问题 4.

7 points Robust Linear Regression When we perform least squares linear regression, we make certain idealized assumptions about the vector of errors $\epsilon$, namely, that it is distributed $\mathcal{N}\left(0, \sigma^2\right)$. In practice departures from these assumptions occur. Particularly, in cases where the error distribution is heavier tailed than the Normal distribution i.e. has more probability in tails than the Normal, the least square loss is sensitive to outliers and hence robust regression methods are of interest.

The problem with the least square loss in the existence of outliers i.e. when the noise term $\epsilon_i$ can be arbitrarily large, is that it weights each observation equally in getting parameter estimates. Robust methods, on the other hand, enable the observations to be weighted unequally. More specifically, observations that produce large residuals are down-weighted by a robust estimation method.

In this problem, you will assume that $\epsilon_1 ; \ldots ; \epsilon_m$ are independent and identically distributed according to a Laplacian distribution (rather than according to $\mathcal{N}\left(0, \sigma^2\right)$ ). That is, each $\epsilon_i \sim \operatorname{Laplace}(0, b)=$ $\frac{1}{2 b} \exp \left(-\frac{\left|\epsilon_i\right|}{b}\right)$
(a) 4 points Provide the loss function $J_{\text {Laplace }}(\beta)$ whose minimization is equivalent to finding the MLE of $\beta$ under the above noise model.
(b) 3 points Why do you think that the above model provides a more robust fit to data compared to the standard model assuming Gaussian distribution of the noise terms?

统计代写|STAT501 Regression Methods

MY-ASSIGNMENTEXPERT™可以为您提供 STAT.PSU STAT501 REGRESSION METHODS线性回归的代写代考和辅导服务!

Leave a comment