Distance-based data-driven robust Markov decision processes

Prof. Archis Ghate
University of Houston

In this talk, I will discuss Markov decision processes (MDPs) where the decision-maker does not know the true state-transition probabilities. The decision-maker assumes that they belong to certain ambiguity sets, and chooses actions that maximize the worst-case expected total discounted reward. I will work under a rectangular setup wherein the ambiguity set for the whole problem is a Cartesian product of ambiguity sets for individual state-action pairs. Specifically, the ambiguity set for any state-action pair is a ball --- it includes all probability mass functions (pmfs) within a certain distance from an empirical transition pmf. I will show that the optimal values of the resulting robust MDPs (RMDPs) converge to the optimal value of the true MDP, if the radii of the ambiguity balls vanish to zero as the sample-size diverges to infinity. A rate of convergence will be derived. I will also establish that the robust optimal value provides a lower bound on the value of the robust optimal policy in the true MDP, with a high probability. These results rely on a generalized Pinsker's inequality and a concentration inequality. These two inequalities hold for several well-known distances. Finally, I will extend this framework and theoretical results to a broader family of rectangular ambiguity sets, and characterize the relative conservativeness of RMDPs from this family.


Archis is a Professor of Industrial & Systems Engineering at the University of Washington in Seattle, where he also held a College of Engineering Endowed Professorship for five years. He joined the University of Washington as an Assistant Professor in 2006 after receiving a PhD in Industrial and Operations Engineering from the University of Michigan in 2006, and an MS in Management Science and Engineering from Stanford in 2003. He completed his undergraduate education at the Indian Institute of Technology, Bombay, India, in 2001. Archis is a recipient of the NSF CAREER award and the award for excellence in teaching operations research from IISE. Archis has also received multiple teaching accolades from the University of Washington. His students have won the Dantzig dissertation award, and the Bonder scholarship in healthcare operations research from INFORMS. Archis has served on the editorial boards of several journals. He was the General Chair of the INFORMS 2019 Annual Meeting, and a Program Co-Chair of the 2021 IISE Annual Conference.