Training a machine to play simple paddle ball game using Deep-Q Learning algorithm with Keras

Written on June 4, 2017

This project demonstrates training a machine to play simple paddle ball game using Deep-Q Learning algorithm with Keras.

This article is intended for beginners.

Pre-requisites

Python (tested on 3.6)
Keras
Theano/Tensorflow
pygame

Source code

https://github.com/azhar2205/paddle-ball-using-dqlearn

How it works (in context of paddle ball game)?

While playing game each action taken in a state (move left, move right, don’t move) impacts the total points obtained at the end of the game. The goal is given a state, select an action such that the future result is maximum.

Lets represent the game screen with a 2-D array. The array elements wrt the position of the ball and the paddle are “1”s. Rest all values are “0”s.

Next step is to decide what action to take. We use neural network to predict reward for each action and select action with maximum reward. However if we just depend on neural network for the next action, then we will be restricted to only to those predicted actions. There can be an action which may give better rewards which is not predicted by machine. So during game play, sometimes we use a random action instead of the predicted action. (This problem is called Exploration-Exploitation dilemma).

Once the action is decided, update the state according to the action i.e. move paddle as per action, and continue ball’s journey as per trajectory.

The action will result in some point gain (ball is bounced off the paddle) or point loss (ball touches ground) or no point change (ball is in air).

Store the tuple <current state, action, reward, next state> in FIFO queue. Neual networks have tendency to adopt to recent training (and hence forget earlier learnings). To fix this, the entries in queue will be used to re-train the neural network. (This process is called Experience Replay).

There is one more importance step done during Experience Replay. The target of the neural network (i.e. the reward for an action) is set to the computed maximum reward (aka Discounted Future Reward). The neural network learns to match output closely to expected Discounted Future Reward.

It is highly recommended to refer this post.

References:

Guest Post (Part I): Demystifying Deep Reinforcement Learning - Nervana

Deep Reinforcement Learning: Pong from Pixels

Using Keras and Deep Q-Network to Play FlappyBird - Ben Lau

Keras plays catch, a single file Reinforcement Learning example - Eder Santana

GitHub - asrivat1/DeepLearningVideoGames

Teaching Your Computer To Play Super Mario Bros. – A Fork of the Google DeepMind Atari Machine Learning Project

Deep Reinforcement Learning: Playing a Racing Game - Byte Tank

https://arxiv.org/pdf/1312.5602.pdf