Markov Decision Process - Wiley
Date: Oct. 15, 2005
This text is based on a course given for Ph.D. systems engineers at the University of Virginia in the autumn of 1987. The approach, and level of mathematics used, is intended for the mathematically minded post- graduate students in such areas as systems engineering, industrial engineering, management science and operations research, and for final year mathematicians and statisticians.
It is not intended to be a research reference text, although reference will be given to some key texts and papers for those wishing to pursue research in this area. It is intended as a basic text, covering some of the fundamentals involved in the manner in which Markov decision problems may be properly formulated and solutions, or properties of
such solutions, determined.
There are three key texts influencing the format of this text, viz. those of Howard [23], van der Wal [53] and Kallenberg [24]. Howard [23], for stationary Markov decision processes, uses what he calls the 'z-transform' method, which is essentially the method of 'generating functions'. This allows expected total discounted, or non-discounted, rewards over a residual n-time unit horizon, to be easily determined from the coefficients of the z-transforms, for any given policy. It also allows one to see how these performance measures depend upon the number of time units, n, of the time horizon, and leads to asymptotic results when n tends to infinity. In principle the z-transforms may be found for each of a set of policies, and the appropriate decision rule selected on the basis of this analysis.
However, this can be impracticable, and an alternative approach, based upon so-called 'optimality (functional) equations', is then used. However, the natural insight gained from the use of z-transform analysis is very helpful, particularly when the form of the dependence of the n-time unit performance on n is needed.
Rapidshare