19th Ave New York, NY 95822, USA

# 计算机代写|NEUR30006 Neural Networks

## NEUR30006课程简介

Availability
Semester 2
Fees Look up fees
The analysis of real neural networks and the construction of artificial neural networks afford mutually synergistic technologies with broad application within and beyond neuroscience. Artificial neural networks, and other machine learning methods, have found numerous applications in analysis and modelling, and have produced insights into numerous complex phenomena (and generated huge economic value). Such technologies can also be used to gain insights into the biological systems that inspired their creation: we will explore how learning is instantiated in artificial and biological neural networks.

## Prerequisites

The subject aims to provide foundation skills for those who may wish to peruse neuroscience – or any research or work environment that involves the creation or capture, and analysis, of complex data. Students will gain experience with digital signals and digital signal processing (whether those signals are related to images, molecular data, connectomes, or electrophysiological recordings), and will learn how to conceptualise and implement approaches to modelling data by constructing an artificial neural network using the Python programming language.

## NEUR30006 Neural Networks HELP（EXAM HELP， ONLINE TUTOR）

Vector Calculus Review
Let $\mathbf{x}, \mathbf{c} \in \mathbb{R}^n$ and $A \in \mathbb{R}^{n \times n}$. For the following parts, before taking any derivatives, identify what the derivative looks like (is it a scalar, vector, or matrix?) and how we calculate each term in the derivative. Then carefully solve for an arbitrary entry of the derivative, then stack/arrange all of them to get the final result. Note that the convention we will use going forward is that vector derivatives of a scalar (with respect to a column vector) are expressed as a row vector, i.e. $\frac{\partial f}{\partial \mathbf{x}}=\left[\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n}\right]$ since a row acting on a column gives a scalar. You may have seen alternative conventions before, but the important thing is that you need to understand the types of objects and how they map to the shapes of the multidimensional arrays we use to represent those types.
(a) Show $\frac{\partial}{\partial \mathbf{x}}\left(\mathbf{x}^T \mathbf{c}\right)=\mathbf{c}^T$
(b) Show $\frac{\partial}{\partial \mathbf{x}}|\mathbf{x}|_2^2=2 \mathbf{x}^T$
(c) Show $\frac{\partial}{\partial \mathbf{x}}(A \mathbf{x})=A$
(d) Show $\frac{\partial}{\partial \mathbf{x}}\left(\mathbf{x}^T A \mathbf{x}\right)=\mathbf{x}^T\left(A+A^T\right)$
(e) Under what condition is the previous derivative equal to $2 \mathrm{x}^T A$ ?

ReLU Elbow Update under SGD
In this question we will explore the behavior of the ReLU nonlinearity with Stochastic Gradient Descent (SGD) updates. The hope is that this problem should help you build a more intuitive understanding for how SGD works and how it iteratively adjusts the learned function.

We want to model a 1D function $y=f(x)$ using a 1-hidden layer network with ReLU activations and no biases in the linear output layer. Mathematically, our network is
$$\hat{f}(x)=\mathbf{W}^{(2)} \Phi\left(\mathbf{W}^{(1)} x+\mathbf{b}\right)$$
where $x, y \in \mathbb{R}, \mathbf{b} \in \mathbb{R}^d, \mathbf{W}^{(1)} \in \mathbb{R}^{d \times 1}$, and $\mathbf{W}^{(2)} \in \mathbb{R}^{1 \times d}$. We define our loss function to be the squared error,
$$\ell\left(x, y, \mathbf{W}^{(1)}, \mathbf{b}, \mathbf{W}^{(2)}\right)=\frac{1}{2}|\hat{f}(x)-y|_2^2 .$$
For the purposes of this problem, we define the gradient of a ReLU at 0 to be 0 .
(a) Let’s start by examining the behavior of a single ReLU with a linear function of $x$ as the input,
$$\phi(x)=\left{\begin{array}{ll} w x+b, & w x+b>0 \ 0, & \text { else } \end{array} .\right.$$
Notice that the slope of $\phi(x)$ is $w$ in the non-zero domain. We define a loss function $\ell(x, y, \phi)=\frac{1}{2}|\phi(x)-y|_2^2$. Find the following:
(i) The location of the ‘elbow’ $e$ of the function, where it transitions from 0 to something else.
(ii) The derivative of the loss w.r.t. $\phi(x)$, namely $\frac{d \ell}{d \phi}$
(iii) The partial derivative of the loss w.r.t. $w$, namely $\frac{\partial \ell}{\partial w}$
(iv) The partial derivative of the loss w.r.t. $b$, namely $\frac{\partial \ell}{\partial b}$
(b) Now suppose we have some training point $(x, y)$ such that $\phi(x)-y=1$. In other words, the prediction $\phi(x)$ is 1 unit above the target $y$ — we are too high and are trying to pull the function downward. Describe what happpens to the slope and elbow of $\phi(x)$ when we perform gradient descent in the following cases:
(i) $\phi(x)=0$.
(ii) $w>0, x>0$, and $\phi(x)>0$. It is fine to check the behavior of the elbow numerically in this case.
(iii) $w>0, x<0$, and $\phi(x)>0$.
(iv) $w<0, x>0$, and $\phi(x)>0$. It is fine to check the behavior of the elbow numerically in this case.

Additionally, draw and label $\phi(x)$, the elbow, and the qualitative changes to the slope and elbow after a gradient update to $w$ and $b$. You should label the elbow location and a candidate $(x, y)$ pair. Remember that the update for some parameter vector $\mathbf{p}$ and loss $\ell$ under SGD is
$$\mathbf{p}^{\prime}=\mathbf{p}-\lambda \nabla_{\mathbf{p}}(\ell), \lambda>0 .$$
(c) Now we return to the full network function $\hat{f}(x)$. Derive the location $e_i$ of the elbow of the $i$ ‘th elementwise ReLU activation.
(d) Derive the new elbow location $e_i^{\prime}$ of the i’th elementwise ReLU activation after one stochastic gradient update with learning rate $\lambda$.

The learning module example showed a neural network to classify elements into FCC, $\mathrm{BCC}$ or HCP crystal structures. Let’s say we now wish to classify elements into FCC, BCC, HCP or Simple Cubic crystal structures. To train this network we need to map these classes to one-hot vectors. Write down one-hot encodings for each of these categories, and write down the one-hot encoding for copper (Review slide 13 for one-hot encoding | Copper has an FCC crystal structure)

What activation function is used in the last layer of the network in the learning module? Write down the equation of this activation function, and the maximum and minimum values this function can take. Continuing with our previous example, let’s say the output of our network for copper is $[0.2,0.3,0.4,0.1]$. What is the predicted crystal structure for copper?

MY-ASSIGNMENTEXPERT™可以为您提供HANDBOOK NEUR30006 NEURAL NETWORKS神经网络课程的代写代考和辅导服务！