|
![]() |
|||
|
||||
OverviewThis monograph introduces various value-based approaches for solving the policy evaluation problem in the online reinforcement learning (RL) scenario, which aims to learn the value function associated with a specific policy under a single Markov decision process (MDP). Approaches vary depending on whether they are implemented in an on-policy or off-policy manner. In on-policy settings, where the evaluation of the policy is conducted using data generated from the same policy that is being assessed, classical techniques such as TD(0), TD(λ), and their extensions with function approximation or variance reduction are employed in this setting. For off-policy evaluation, where samples are collected under a different behavior policy, this monograph introduces gradient-based two-timescale algorithms like GTD2, TDC, and variance-reduced TDC. These algorithms are designed to minimize the mean-squared projected Bellman error (MSPBE) as the objective function. This monograph also discusses their finite-sample convergence upper bounds and sample complexity. Full Product DetailsAuthor: Yi Zhou , Shaocong MaPublisher: now publishers Inc Imprint: now publishers Inc Weight: 0.099kg ISBN: 9781638283706ISBN 10: 1638283702 Pages: 60 Publication Date: 15 August 2024 Audience: Professional and scholarly , Professional & Vocational Format: Paperback Publisher's Status: Active Availability: In Print ![]() This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |