DISCRETE-TIME MULTI-PLAYER GAMES BASED ON OFF-POLICY Q-LEARNING

Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Blog Article

In this paper, an off-policy game Q-learning algorithm is proposed for solving Dryer Duct Hose linear discrete-time non-zero sum multi-player game problems.Unlike the existing Q-learning methods for solving the Riccati equation by on-policy learning approaches for multi-player games, an off-policy game Q-learning method is developed for achieving the Nash equilibrium of multiple players.To this end, first, a non-zero sum game problem is formulated, and the value function and the Q-function defined according to each-player individual performance index are rigorously proved to be linear quadratic forms.

Then, based on the dynamic programming and Q-learning methods, an off-policy game Q-learning algorithm is developed to find the control policies for multi-player games, such that the Nash equilibrium is reached under the learned control policies.The merit L-ARGININE 1000MG of this paper lies in that the proposed algorithm does not require the system model parameters to be known a priori and fully utilizes measurable data to learn the Nash equilibrium solution.Moreover, there is no bias of Nash equilibrium solution when implementing the proposed off-policy game Q-learning algorithm even though probing noises are added to control policies for maintaining the persistent excitation condition.

While bias of the Nash equilibrium solution could be produced if on-policy game Q-learning is employed.This is another contribution of this paper.

Report this page