Statistics

Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition by Warren B. Powell (PDF)

Ebook Info

Published: 2011
Number of pages: 606 pages
Format: PDF
File Size: 3.63 MB
Authors: Warren B. Powell

Description

Praise for the First Edition”Finally, a book devoted to dynamic programming and written using the language of operations research (OR)! This beautiful book fills a gap in the libraries of OR specialists and practitioners.” ―Computing ReviewsThis new edition showcases a focus on modeling and computation for complex classes of approximate dynamic programming problemsUnderstanding approximate dynamic programming (ADP) is vital in order to develop practical and high-quality solutions to complex industrial problems, particularly when those problems involve making decisions in the presence of uncertainty. Approximate Dynamic Programming, Second Edition uniquely integrates four distinct disciplines―Markov decision processes, mathematical programming, simulation, and statistics―to demonstrate how to successfully approach, model, and solve a wide range of real-life problems using ADP.The book continues to bridge the gap between computer science, simulation, and operations research and now adopts the notation and vocabulary of reinforcement learning as well as stochastic search and simulation optimization. The author outlines the essential algorithms that serve as a starting point in the design of practical solutions for real problems. The three curses of dimensionality that impact complex problems are introduced and detailed coverage of implementation challenges is provided. The Second Edition also features:A new chapter describing four fundamental classes of policies for working with diverse stochastic optimization problems: myopic policies, look-ahead policies, policy function approximations, and policies based on value function approximationsA new chapter on policy search that brings together stochastic search and simulation optimization concepts and introduces a new class of optimal learning strategiesUpdated coverage of the exploration exploitation problem in ADP, now including a recently developed method for doing active learning in the presence of a physical state, using the concept of the knowledge gradientA new sequence of chapters describing statistical methods for approximating value functions, estimating the value of a fixed policy, and value function approximation while searching for optimal policiesThe presented coverage of ADP emphasizes models and algorithms, focusing on related applications and computation while also discussing the theoretical side of the topic that explores proofs of convergence and rate of convergence. A related website features an ongoing discussion of the evolving fields of approximation dynamic programming and reinforcement learning, along with additional readings, software, and datasets.Requiring only a basic understanding of statistics and probability, Approximate Dynamic Programming, Second Edition is an excellent book for industrial engineering and operations research courses at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who utilize dynamic programming, stochastic programming, and control theory to solve problems in their everyday work.

User’s Reviews

Editorial Reviews: From the Author This book is a major revision of the first edition, with seven new or heavily revised chapters. This edition starts clearly from a foundation in reinforcement learning (using classical RL notation and concepts), but continues to build a bridge to the types of high-dimensional problems familiar to operations research. The book uses “a” for discrete actions, but switches to “x” for vector-valued decisions, common in the operations research community but largely ignored in computer science.The book begins with an overview of a wide array of problems (in chapter 2), with an introduction to classical Markov decision processes in chapter 3 (minor changes from the first edition). Chapter 4 is a completely rewritten introduction to reinforcement learning using classical concepts, with one major exception. It now provides an extended overview of the concept of the post-decision state variable, which is used throughout the book (because it avoids the imbedded expectation within the min/max operator). The RL community is very familiar with the use of state-action pairs (in Q-factors), which is a clumsy form of post-decision state. Chapter 4 gives a series of examples where the post-decision state can (but does not always) provide significant benefits.Chapter 5 is a minor revision of the old chapter 5, providing an in-depth discussion of how to model a dynamic program.The book brings together different fields within stochastic optimization by identifying (in chapter 6) four fundamental classes of policies: 1) myopic policies (which ignore the future), 2) lookahead policies (which optimize over a short horizon to determine the decision to be made now), 3) policy function approximations (analytic functions that return an action given a state), and 4) policies based on value function approximations. Policy function approximations (a little-used term that I am promoting) and value function approximations both require some sort of method for approximating a function, of which there are three basic classes: lookup tables, parametric models, and nonparametric models.Of course, you can build hybrid policies by mixing and matching.Chapter 7 provides an in-depth treatment of methods for optimizing policy function approximations (widely known as “policy search” in the RL literature). In addition to classical material from stochastic search, it also includes a description of the knowledge gradient concept, which we developed as part of research to develop insights into the exploration-exploitation problem. Chapters 8, 9 and 10 provide a layered presentation of how to go about designing policies based on value function approximations. Chapter 8 is an overview of popular methods for approximating functions. Chapter 9 discusses methods for obtaining an update to a value function approximation for a fixed policy, and Chapter 10 describes the complex problem of approximating a value function approximation while simultaneously optimizing over policies.Chapter 11 provides an introduction to the fundamentals of updating value function approximations based on stochastic approximation methods, with an overview of stepsize formulas. This is a revision of the old chapter 6, somewhat streamlined but with new insights into how to design stepsize rules for different algorithmic strategies (Q-learning, approximate value iteration, LSTD, LSPE) and a new, optimal stepsize rule designed specifically for methods based on bootstrapping (TD(0), Q-learning, approximate value iteration).Chapter 12 is a completely rewritten chapter on the exploration vs. exploitation problem, including a new algorithm for using the knowledge gradient idea (designed originally for pure learning problems) in the presence of a physical state.Chapters 13 and 14 retain the material from the first edition on designing value function approximations in the context of high-dimensional resource allocation problems which exploit convexity.Warren Powell From the Inside Flap Praise for the First Edition”Finally, a book devoted to dynamic programming and written using the language of operations research (OR)! This beautiful book fills a gap in the libraries of OR specialists and practitioners.” —Computing ReviewsThis new edition showcases a focus on modeling and computation for complex classes of approximate dynamic programming problemsUnderstanding approximate dynamic programming (ADP) is vital in order to develop practical and high-quality solutions to complex industrial problems, particularly when those problems involve making decisions in the presence of uncertainty. Approximate Dynamic Programming, Second Edition uniquely integrates four distinct disciplines—Markov decision processes, mathematical programming, simulation, and statistics—to demonstrate how to successfully approach, model, and solve a wide range of real-life problems using ADP.The book continues to bridge the gap between computer science, simulation, and operations research and now adopts the notation and vocabulary of reinforcement learning as well as stochastic search and simulation optimization. The author outlines the essential algorithms that serve as a starting point in the design of practical solutions for real problems. The three curses of dimensionality that impact complex problems are introduced and detailed coverage of implementation challenges is provided. The Second Edition also features:A new chapter describing four fundamental classes of policies for working with diverse stochastic optimization problems: myopic policies, look-ahead policies, policy function approximations, and policies based on value function approximationsA new chapter on policy search that brings together stochastic search and simulation optimization concepts and introduces a new class of optimal learning strategiesUpdated coverage of the exploration exploitation problem in ADP, now including a recently developed method for doing active learning in the presence of a physical state, using the concept of the knowledge gradientA new sequence of chapters describing statistical methods for approximating value functions, estimating the value of a fixed policy, and value function approximation while searching for optimal policiesThe presented coverage of ADP emphasizes models and algorithms, focusing on related applications and computation while also discussing the theoretical side of the topic that explores proofs of convergence and rate of convergence. A related website features an ongoing discussion of the evolving fields of approximation dynamic programming and reinforcement learning, along with additional readings, software, and datasets.Requiring only a basic understanding of statistics and probability, Approximate Dynamic Programming, Second Edition is an excellent book for industrial engineering and operations research courses at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who utilize dynamic programming, stochastic programming, and control theory to solve problems in their everyday work. From the Back Cover Praise for the First Edition”Finally, a book devoted to dynamic programming and written using the language of operations research (OR)! This beautiful book fills a gap in the libraries of OR specialists and practitioners.” ―Computing ReviewsThis new edition showcases a focus on modeling and computation for complex classes of approximate dynamic programming problemsUnderstanding approximate dynamic programming (ADP) is vital in order to develop practical and high-quality solutions to complex industrial problems, particularly when those problems involve making decisions in the presence of uncertainty. Approximate Dynamic Programming, Second Edition uniquely integrates four distinct disciplines―Markov decision processes, mathematical programming, simulation, and statistics―to demonstrate how to successfully approach, model, and solve a wide range of real-life problems using ADP.The book continues to bridge the gap between computer science, simulation, and operations research and now adopts the notation and vocabulary of reinforcement learning as well as stochastic search and simulation optimization. The author outlines the essential algorithms that serve as a starting point in the design of practical solutions for real problems. The three curses of dimensionality that impact complex problems are introduced and detailed coverage of implementation challenges is provided. The Second Edition also features:A new chapter describing four fundamental classes of policies for working with diverse stochastic optimization problems: myopic policies, look-ahead policies, policy function approximations, and policies based on value function approximationsA new chapter on policy search that brings together stochastic search and simulation optimization concepts and introduces a new class of optimal learning strategiesUpdated coverage of the exploration exploitation problem in ADP, now including a recently developed method for doing active learning in the presence of a physical state, using the concept of the knowledge gradientA new sequence of chapters describing statistical methods for approximating value functions, estimating the value of a fixed policy, and value function approximation while searching for optimal policiesThe presented coverage of ADP emphasizes models and algorithms, focusing on related applications and computation while also discussing the theoretical side of the topic that explores proofs of convergence and rate of convergence. A related website features an ongoing discussion of the evolving fields of approximation dynamic programming and reinforcement learning, along with additional readings, software, and datasets.Requiring only a basic understanding of statistics and probability, Approximate Dynamic Programming, Second Edition is an excellent book for industrial engineering and operations research courses at the upper-undergraduate and graduate levels. It also serves as a valuable reference for researchers and professionals who utilize dynamic programming, stochastic programming, and control theory to solve problems in their everyday work. About the Author WARREN B. POWELL, PhD, is Professor of Operations Research and Financial Engineering at Princeton University, where he is founder and Director of CASTLE Laboratory, a research unit that works with industrial partners to test new ideas found in operations research. The recipient of the 2004 INFORMS Fellow Award, Dr. Powell has authored more than 160 published articles on stochastic optimization, approximate dynamicprogramming, and dynamic resource management. Read more

Reviews from Amazon users which were colected at the time this book was published on the website:

⭐A good primer on ADP. The book is written clearly and concepts are easy to understand. However, the notation is different from the 2 volume DP books by Bertsekas so it might take sometimes to get used to. Personally I prefer Bertsekas’s notations.

⭐I used this book to teach myself reinforcement learning, and found that this book is very nice to read, with the concept explained very clear and it really helps me capture the global picture of this topic.

⭐Get the hardcopy version, not the Kindle edition.I got the Kindle version for portability, but its not worth the trade-off. This edition ought to be recalled. The equations are rendered in a fuzzy image font that doesn’t align with the surrounding text, and certain math symbols (such as the times symbol) are not rendered correctly.

⭐Very good

⭐Well written, ties together concepts very well, provides good examples and implementation caveats.

⭐Positive: useful. Negatives: plodding, narrow, and the notation is confusing and (from my logician/programmer perspective) imprecise – to be precise, often referentially opaque; thus, e.g., most people would assume that expectation distributes over sums, but not in Powell’s world. Ends with a weird exhortation to the reader to patent any models that he or she comes up with (and become rich?). I would have thought/hoped that the book itself represented prior art. It certainly should.[note added later] I’d be inclined to take a star off this review, now that I’ve spent longer with the book. When I get in to the detail, I encounter too much vagueness: it is often hard to figure out what is _precisely_ the idea. An awful lot of flicking back and forward in the text is often involved, not to mention a lot of googling, and useful/necessary detail is sometimes simply missing. Simple case in point, in discussion of the model 2.2.5 / 14.1.1, he asserts that the model is analytically solvable using Bellman’s equation, and converges in practice for a numerical simulation [for the record, I have no problem believing this], but he never actually tells you what distributions are driving the implementation he discusses – the distributions from which D_t and cap R_t are drawn – or the model for the valuation functions that he is actually using, so it is difficult (read: impossible) to compare an independent implementation with his. So yes I have learned from this book, but too much of what I have learned is a consequence of a process of external following up of work on google to figure out what exactly the author might have in mind.

⭐I have been trying to find new books on reinforcement learning (RL), but there are very few. The best book seems to be the original classic by Richard Sutton and Andrew Barto [1998].I just learned that “reinforcement learning” is just the terminology used by the artificial intelligence community, but it is basically synonymous with “dynamic programming”, of which there are many books and new research.Dynamic programming can be defined as any arbitrary optimization problem whose main objective can be stated by a recursive optimality condition known as “Bellman’s equation”. The equation can also be generalized to a differential form known as the Hamilton-Jacobi-Bellman (HJB) equation.The recursive equation naturally lends itself to a solution via “backward induction”, ie, starting from the “last” time-period which can be solved trivially, and moving backwards in time, one step at a time.If the event horizon is infinite, the “last step” does not exist, but the optimization problem could still be solved by infinite iteration of an operator. The optimal point is a fixed point of the operator.The main methods for solving dynamic programming are: 1) value iteration and 2) policy iteration.Even though the recursive relation greatly reduces the problem space, general dynamic programming problems could still be intractable due to:1) large state space2) large outcome space3) large action spaceThey are the “3 curses of dimensionality”.This book, “Approximate Dynamic Programming”, proposes solution strategies for dealing with these problems.

⭐Wonderful book. Very well written. Good book to start learning dynamic programming.

Download

Keywords

Free Download Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition in PDF format
Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition PDF Free Download
Download Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition 2011 PDF Free
Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition 2011 PDF Free Download
Download Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition PDF
Free Download Ebook Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition

Ebook Info

Description

User’s Reviews

Keywords

RELATED ARTICLESMORE FROM AUTHOR

Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition (Wiley Series in Probability and Statistics) 2nd Edition by Warren B. Powell (PDF)

Most viewed Categories

RELATED ARTICLES MORE FROM AUTHOR