19th Ave New York, NY 95822, USA

# CS代写|强化学习代写Reinforcement learning代考|COMP4702 Technical Remarks

my-assignmentexpert™提供最专业的一站式服务：Essay代写，Dissertation代写，Assignment代写，Paper代写，Proposal代写，Proposal代写，Literature Review代写，Online Course，Exam代考等等。my-assignmentexpert™专注为留学生提供Essay代写服务，拥有各个专业的博硕教师团队帮您代写，免费修改及辅导，保证成果完成的效率和质量。同时有多家检测平台帐号，包括Turnitin高级账户，检测论文不会留痕，写好后检测修改，放心可靠，经得起任何考验！

## CS代写|强化学习代写Reinforcement learning代考|Technical Remarks

Remark $3.1$ (Non-parametric distributional Monte Carlo algorithm). In Section 3.4, we saw that the (unprojected) finite-horizon categorical Monte Carlo algorithm can in theory learn finite-horizon return-distribution functions when there are only a small number of possible returns. It is possible to extend these ideas to obtain a straightforward, general-purpose algorithm that can be sometimes be used to learn an accurate approximation to the return distribution.
Like the sample-mean Monte Carlo method, the non-parametric distributional Monte Carlo algorithm takes as input $K$ finite-length trajectories with a common source state $x_0$. After computing the sample returns $\left(g_k\right){k=1}^K$ from these trajectories, it constructs the estimate $$\hat{\eta}^\pi\left(x_0\right)=\frac{1}{K} \sum{k=1}^K \delta_{g_k}$$
of the return distribution $\eta^\pi\left(x_0\right)$. Here, non-parametric refers to the fact that the approximating distribution in Equation $3.22$ is not described by a finite collection of parameters; in fact, the memory required to represent this object may grow linearly with $K$. Although this is not an issue when $K$ is relatively small, this can be undesirable when working with large amounts of data, and moreover precludes the use of function approximation (see Chapters 9 and 10).

## CS代写|强化学习代写Reinforcement learning代考|Bibliographical Remarks

The development of a distributional algorithm in this chapter follows our own development of the distributional perspective, beginning with our work on using compression algorithms in reinforcement learning [Veness et al., 2015].
3.1. The first-visit Monte Carlo estimate is studied by Singh and Sutton [1996], where it is used to characterise the properties of replacing eligibility traces [see also Sutton and Barto, 2018]. Statistical properties of model-based estimates (which solve for the Markov decision process’s parameters as an intermediate step) are analysed by Mannor et al. [2007]. Grünewälder and Obermayer [2011]

argue that model-based methods must incur statistical bias, an argument that also extends to temporal-difference algorithms. Their work also introduces a refined sample-mean Monte Carlo method that yields a minimum-variance unbiased estimator (MVUE) of the value function. See Browne et al. [2012] for a survey of Monte Carlo tree search methods, and Liu [2001], Robert and Casella [2004], Owen [2013] for further background on Monte Carlo methods more generally.
3.2. Incremental algorithms are a staple of reinforcement learning and have roots in stochastic approximation [Robbins and Monro, 1951, Widrow and Hoff, 1960, Kushner and Yin, 2003] and psychology [Rescorla and Wagner, 1972]. In the control setting, these are also called optimistic policy iteration methods, and exhibit fairly complex behaviour [Sutton, 1999, Tsitsiklis, 2002].

## CS代写|强化学习代写Reinforcement learning代考|技术备注

. c

。这里的非参数是指方程$3.22$中的近似分布不是用有限的参数集合来描述的;事实上，表示该对象所需的内存可能会随着$K$线性增长。虽然当$K$相对较小时，这不是一个问题，但当处理大量数据时，这可能是不希望的，而且还排除了函数近似的使用(参见第9章和第10章)

## CS代写|强化学习代写Reinforcement learning代考|参考说明

. CS代写|强化学习代写 本章中分布式算法的开发遵循了我们自己对分布式视角的开发，从我们在强化学习中使用压缩算法的工作开始[vity等人，2015]。Singh和Sutton[1996]研究了首次访问蒙特卡洛估计，并将其用于描述替换资格轨迹的属性[另见Sutton和Barto, 2018]。Mannor等人[2007]分析了基于模型的估计(求解马尔可夫决策过程的参数作为中间步骤)的统计特性。Grünewälder and Obermayer [2011]

.

CS代写|强化学习代写Reinforcement learning代考 请认准UprivateTA™. UprivateTA™为您的留学生涯保驾护航。

## Matlab代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。