## 计算机代写|机器学习代写Machine Learning代考|ECOC Weighted Decoding

The ECOC method consists of repeatedly partitioning the full set of $N$ classes $\Omega=\left{\omega_i \mid i=1 \ldots N\right}$ into $L$ super-class pairs. The choice of partitions is represented by an $N \times L$ binary code matrix $\mathbf{Z}$. The rows $\mathbf{Z}i$ are unique codewords that are associated with the individual target classes $\omega_i$ and the columns $\mathbf{Z}^j$ represent the different super-class partitions. Denoting the $j$ th super-class pair by $\mathrm{S}^j$ and $\overline{\mathrm{S}^j}$, element $Z{i j}$ of the code matrix is set to 1 or $0^1$ depending on whether class $\omega_i$ has been put into $\mathrm{S}^j$ or its complement. A separate base classifier is trained to solve each of these 2-class problems.

Given an input pattern vector $\mathbf{x}$ whose true class $c(\mathbf{x}) \in \Omega$ is unknown, let the soft output from the $j$ th base classifier be $s_j(\mathbf{x}) \in[0,1]$. The set of outputs from all the classifiers can be assembled into a vector $\mathbf{s}(\mathbf{x})=\left[s_1(\mathbf{x}), \ldots, s_L(\mathbf{x})\right]^{\mathrm{T}} \in[0,1]^L$ called the output code for $\mathbf{x}$. Instead of working with the soft base classifier outputs, we may also first harden them, by rounding to 0 or 1 , to obtain the binary vector $\mathbf{h}(\mathbf{x})=\left[h_1(\mathbf{x}), \ldots, h_L(\mathbf{x})\right]^{\mathrm{T}} \in{0,1}^L$. The principle of the ECOC technique is to obtain an estimate $\hat{c}(\mathbf{x}) \in \Omega$ of the class label for $\mathbf{x}$ from a knowledge of the output code $\mathbf{s}(\mathbf{x})$ or $\mathbf{h}(\mathbf{x})$.

In its general form, a weighted decoding procedure makes use of an $N \times L$ weights matrix $\mathbf{W}$ that assigns a different weight to each target class and base classifier combination. For each class $\omega_i$ we may use the $\mathrm{L}1$ metric to compute a class score $F_i(\mathbf{x}) \in[0,1]$ as follows: $$F_i(\mathbf{x})=1-\sum{\mathrm{j}=1}^{\mathrm{L}} \mathbf{W}{\mathrm{ij}}\left|\mathrm{s}{\mathrm{j}}(\mathbf{x})-\mathbf{Z}{\mathrm{ij}}\right|,$$ where it is assumed that the rows of $\mathbf{W}$ are normalized so that $\sum{j=1}^L \mathbf{W}{i j}=1$ for $i=$ $1 \ldots N$. Patterns may then be assigned to the target class $\hat{c}(\mathbf{x})=\arg \max {\omega_i} F_i(\mathbf{x})$. If the base classifier outputs $s_j(\mathbf{x})$ in Eq. 1.1 are replaced by hardened values $h_j(\mathbf{x})$ then this describes the weighted Hamming decoding procedure.

In the context of this chapter $\Omega$ is the set of known AU groups and we are also interested in combining the class scores to obtain values that measure the likelihood that AUs are present; this is done by summing the $F_i(\mathbf{x})$ over all $\omega_i$ that contain the given $\mathrm{AU}$ and dividing by $N$. That is, the score $G_k \in[0,1]$ for $\mathrm{AU}{\mathrm{k}}$ is given by: $$G_k(\mathbf{x})=\frac{1}{N} \sum{A U_k \in \omega_i} F_i(\mathbf{x})$$

## 计算机代写|机器学习代写Machine Learning代考|Platt Scaling

It often arises in pattern recognition applications that we would like to obtain a probability estimate for membership of a class but that the soft values output by our chosen classification algorithm are only loosely related to probability. Here, this applies to the scores $G_k(\mathbf{x})$ obtained by applying Eq. 1.2 to detect individual AUs in an image. Ideally, the value of the scores would be balanced, so that a value $>0.5$ could be taken to indicate that $\mathrm{AU}{\mathrm{k}}$ is present. In practice, however, this is often not the case, particularly when $\mathrm{AU}{\mathrm{k}}$ belongs to more than or less than half the number of AU groups.

To correct for this problem Platt scaling [15] is used to remap the training-set output scores $G_k(\mathbf{x})$ to values which satisfy this requirement. The same calibration curve is then used to remap the test-set scores. An alternative approach would have been to find a separate threshold for each AU but the chosen method has the added advantage that the probability information represented by the remapped scores could

be useful in some applications. Another consideration is that a wide range of thresholds can be found that give low training error so some means of regularisation must be applied in the decision process.Platt scaling, which can be applied to any 2-class problem, is based on the regularisation assumption that the correct form of calibration curve that maps classifier scores $G_k(\mathbf{x})$ to probabilities $p_k(\mathbf{x})$, for an input pattern $\mathbf{x}$, is a sigmoid curve described by the equation:
$$p_k(\mathbf{x})=\frac{1}{1+\exp \left(A G_k(\mathbf{x})+B\right)},$$
where the parameters $A$ and $B$ together determine the slope of the curve and its lateral displacement. The values of $A$ and $B$ that best fit a given training set are obtained using an expectation maximisation algorithm on the positive and negative examples. A separate calibration curve is computed for each value of $k$.

ECOC方法包括将完整的$N$类集$\Omega=\left{\omega_i \mid i=1 \ldots N\right}$重复划分为$L$超类对。分区的选择由一个$N \times L$二进制代码矩阵$\mathbf{Z}$表示。行$\mathbf{Z}i$是与各个目标类$\omega_i$相关联的唯一码字，列$\mathbf{Z}^j$表示不同的超类分区。通过$\mathrm{S}^j$和$\overline{\mathrm{S}^j}$表示$j$超类对，代码矩阵的元素$Z{i j}$被设置为1或$0^1$，这取决于类$\omega_i$是否被放入$\mathrm{S}^j$或它的补充中。一个单独的基分类器被训练来解决这两类问题。

