|
![]() |
|||
|
||||
OverviewReinforcement learning (RL) is concerned with learning to take actions to maximize rewards, by trial and error, in environments that can evolve in response to actions. A Markov decision process (MDP) [6] is a popular framework to model decision making in RL environments. In the MDP, starting from an initial observed state, an agent repeatedly (a) takes an action, (b) receives a reward, and (c) observes the next state of the MDP. The traditional objective in RL is a search goal - find a policy (a rule to select an action for each state) with high total reward using as few interactions with the environment as possible, known as the sample complexity of RL problem [7]. This is, however, quite different from the corresponding optimization goal, where the learner seeks to maximize the total reward earned from all its decisions, or equivalently, minimize the regret or shortfall in total reward compared to that of an optimal policy [8]. This objective is relevant in many practical sequential decision-making settings in which every decision that is taken carries utility or value - recommendation systems (clicks by consumers translate into revenue), sequential investment and portfolio allocation (financial holdings make profits or losses), dynamic resource allocation in communication systems scheduling decisions affect data throughput), to name a few. Full Product DetailsAuthor: Sayak Ray ChowdhuryPublisher: Classichouse Imprint: Classichouse Dimensions: Width: 21.60cm , Height: 1.00cm , Length: 27.90cm Weight: 0.454kg ISBN: 9798224721306Pages: 190 Publication Date: 29 March 2024 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: In Print ![]() This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |