Optimal Control of Distributed Markov Decision Processes with Network Delays

We consider the problem of finding an optimal feedback controller for a network of interconnected subsystems, each of which is a Markov decision process. Each subsystem is coupled to its neighbors via communication links by which signals are delayed but are otherwise transmitted noise-free. One of the subsystems receives input from a controller, and the controller receives delayed state-measurements from all of the subsystems. We show that an optimal controller requires only a finite amount of memory which does not grow with time, and obtain a bound on the amount of memory that a controller needs to have for each subsystem. This makes the computation of an optimal controller through dynamic programming tractable. We illustrate our result by a numerical example, and show that it generalizes previous results on Markov decision processes with delayed state measurements.