Networked Markov Decision Processes with Delays

We consider a networked control system, where each subsystem evolves as a Markov decision process with some extra inputs from other systems. Each subsystem is coupled to its neighbors via communication links over which the signals are delayed, but are otherwise transmitted noise-free. A centralized controller receives delayed state information from each subsystem. The control action applied to each subsystem takes effect after a certain delay rather than immediately. We give an explicit bound on the finite history of measurement and control that is required for the optimal control of such networked MDPs. We also show that these bounds depend only on the underlying graph structure as well as the associated delays. Thus, the partially observed Markov decision process associated with a networked Markov decision process can be converted into an information state Markov decision process, whose state does not grow with time.