MY-ASSIGNMENTEXPERT™可以为您提供sydney COMP5318 Machine Learning机器学习课程的代写代考和辅导服务!
这是悉尼大学机器学习课程的代写成功案例。
COMP5318课程简介
Machine learning is the process of automatically building mathematical models that explain and generalise datasets. It integrates elements of statistics and algorithm development into the same discipline. Data mining is a discipline within knowledge discovery that seeks to facilitate the exploration and analysis of large quantities for data, by automatic and semiautomatic means. This subject provides a practical and technical introduction to machine learning and data mining. Topics to be covered include problems of discovering patterns in the data, classification, regression, feature extraction and data visualisation. Also covered are analysis, comparison and usage of various types of machine learning techniques and statistical techniques.
Prerequisites
At the completion of this unit, you should be able to:
- LO1. understand the basic principles, strengths, weaknesses and applicability of machine learning algorithms for solving classification, regression, clustering and reinforcement learning tasks.
- LO2. have obtained practical experience in designing, implementing and evaluating machine learning algorithms
- LO3. have gained practical experience in using machine learning software and libraries
- LO4. present and interpret data and information in verbal and written form
COMP5318 Machine Learning HELP(EXAM HELP, ONLINE TUTOR)
(Regression) Given a set of training examples $\left(x_1, y_1\right), \ldots,\left(x_N, y_N\right) \in \mathbf{R}^n \times \mathbf{R}$, let $\mathbf{X}$ be a $N \times n$ matrix with row $i$ corresponding to example $x_i$, and let $\mathbf{y}=\left[y_1, \ldots, y_N\right]^T$ (column vector containing the $N$ training labels). Consider the problem of finding $\mathbf{w} \in \mathbf{R}^n$ minimizing
$$
|\mathbf{X} \cdot \mathbf{w}-\mathbf{y}|^2+|\mathbf{w}|^2
$$
where $|\cdot|$ is the Euclidean norm.
Does the regularization term $|\mathbf{w}|^2$ force the solution vector $\mathbf{w}$ to have a small number of non-zero entries? Explain why or why not.
Solution: The answer is no, due to the nature of quadratic loss. When there are several correlated features with a significant effect on $y$, ridge regression tends to “share” the coefficient value among them (which results in a smaller $L_2$ penalty than putting a large value on a small subset of them). If we use the $L_1$ penalty $|\mathbf{w}|_1=\sum_{i=1}^n\left|w_i\right|$ instead, there will be a tendency to zero out most (if not all but one) correlated features, resulting in a sparse coefficient vector $\mathbf{w}$.
Describe a concept class $C$ where the worst-case mistake bound of the halving algorithm $(\log |C|)$ is not tight. Explain your answer.
Solution: Example 1 in the paper below.
Bonus: You will receive bonus points for examples significantly different from the ones appearing in the following paper:
Nick Littlestone, Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm, Machine Learning, 2(4): 285-318, 1998.
Copying any text from the paper will deterministically lead to a quiz. You can use examples from the paper, but you have to explain them yourself.
Describe a concept class $C$ where no randomized algorithm can do better than $(\log |C|) / 2$ mistakes in expectation (over the randomness in the algorithm). Explain your answer.
Solution: Let $C$ be the class of monotone disjunctions on $n$ boolean variables and consider the following sequence of $n$ examples: examle $x_t$ has $t$-th bit set to 1 and all other bits set to 0 , for $t$ from 1 to $n$. Consider any randomized algorithm making binary predictions on this sequence. Let $p_t$ be the probability that the algorithm outputs 1 in trial $t$. Imagine that the true labeling of these examples given by $\mathbf{1}\left(p_t \leq 1 / 2\right)$, so the label of $x_t$ is 1 if $p_t \leq 1 / 2$ and 0 otherwise; this labeling is certainly consistent with a disjunction. The expected number of mistakes that the algorithm makes on the sequence is $\sum_{t=1}^n \max \left{p_t, 1-p_t\right} \geq n / 2$.
This lower bound can be matched by an algorithm that outputs according to a random OR function, which includes each variable with probability $1 / 2$. Thus each $p_t=1 / 2$, and the expected number of mistakes is $n / 2$.
An online learning algorithm is lazy if it changes its state only when it makes a mistake. Show that for any deterministic online learning algorithm $A$ achieving a mistake bound $M$ with respect to a concept class $C$, there exists a lazy algorithm $A^{\prime}$ that also achieves $M$ with respect to $C$.
Recall the definition: Algorithm $A$ has a mistake bound $M$ with respect to a learning class $C$ if $A$ makes at most $M$ mistakes on any sequence that is consistent with a function in $C$.
Solution: Let $A^{\prime}$ be the lazy version of $A$ (has the same update rule as $A$ on mistakes and doesn’t change its state on correctly labeled examples). We will show that $A^{\prime}$ has a mistake bound $M$ with respect to $C$. Assume that there exists a sequence of examples on which $A^{\prime}$ makes $M^{\prime}>M$ mistakes. Cross out all examples on which $A^{\prime}$ doesn’t make a mistake, and let $s$ denote the resulting sequence. Both $A$ and $A^{\prime}$ behave identically on $s$, and $A$ makes at most $M$ mistakes on any sequence of examples, including $s$. This leads to the desired contradiction. (Obviously, if the original sequence is consistent with a concept in $C$, so is $s$.)
MY-ASSIGNMENTEXPERT™可以为您提供SYDNEY COMP5318 MACHINE LEARNING机器学习课程的代写代考和辅导服务!