# 计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代考|MS-E1150 Derivative of a function

## 计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代考|Derivative of a function

Suppose we have function $y=f(x), x, y$ real numbers

Derivative of function denoted: $f^{\prime}(x)$ or as $d y / d x$

Derivative $f^{\prime}(x)$ gives the slope of $f(x)$ at point $x$

It specifies how to scale a small change in input to obtain a corresponding change in the output:
$$f(x+\varepsilon) \approx f(x)+\varepsilon f^{\prime}(x)$$

It tells how you make a small change in input to make a small improvement in $y$
Recall what’s the derivative for the following functions:
$$f(x)=x^2$$
$$f(x)=e^x$$

Calculus in Optimization

Suppose we have function $y=f(x)$, where $x, y$ are real numbers

Sign function:
$$\operatorname{sign}(x)=\left{\begin{array}{cl} -1 & \text { if } x<0 \\ 0 & \text { if } x=0 \\ 1 & \text { if } x>0 \end{array}\right.$$

We know that
This technique is
for small $\varepsilon$. $f\left(x-\epsilon \operatorname{sign}\left(f^{\prime}(x)\right)<f(x) \quad \begin{array}{l}\text { called gradient } \ \text { descent (Cauchy }\end{array}\right.$

Therefore, we can reduce $f(x)$ by moving $x$ in small steps with opposite sign of derivative
Why opposite?

## 计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代考|Gradient

Minimizing with multiple dimensional inputs

We often minimize functions with multiple-dimensional inputs
$$f: \mathrm{R}^n \rightarrow \mathrm{R}$$

For minimization to make sense there must still be only one (scalar) output

Functions with multiple inputs

Partial derivatives
$$\frac{\partial}{\partial x_i} f(x)$$
measures how $f$ changes as only variable $x_i$ increases at point $\boldsymbol{x}$

Gradient generalizes notion of derivative where derivative is wrt a vector

Gradient is vector containing all of the partial derivatives denoted
$$\nabla_x f(x)=\left(\frac{\partial}{\partial x_1} f(x), \ldots, \frac{\partial}{\partial x_n} f(x)\right)$$

Functions with multiple inputs

Gradient is vector containing all of the partial derivatives denoted
Element $i$ of the gradient is the partial derivative of $f$ wrt $x_i$

Critical points are where every element of the gradient is equal to zero
$$\nabla_x f(x)=0 \equiv\left{\begin{array}{l} \frac{\partial}{\partial x_1} f(x)=0 \ \cdots \ \frac{\partial}{\partial x_n} f(x)=0 \end{array}\right.$$

