Kardi Teknomo
Kardi Teknomo Kardi Teknomo Kardi Teknomo
     
 
Research
Publications
Tutorials
Resume
Personal
Resources
Contact

 

Q-Learning Example: Tower of Hanoi

by Kardi Teknomo

<Contents | Previous | Next >

To wet your appetite more about this Q learning, I make another simple example on solitary game of Tower of Hanoi . Click here if you want to actually play this game

 

We have three towers name A, B and C and 3 disks of different size name S (small), M (medium), and L (large). In the beginning, all the disks are in tower A and the goal is to move the disks, one at a time, so that in the final state all the three disks are in tower C. The constraint is the size of the disk: a smaller disk cannot be put below a larger disk. Thus, you cannot put disk S below M or L, and you cannot put disk M below L.

The game is very simple and the goal is to get the minimum number of disk movement.

 

Can you think of a solution?

 

Graph below is my solution (download the MS Excel companion of this Tower of Hanoi tutorial here). I trace all possible combination of states. In total there are 17 possible states.

 

To use Q learning to solve this problem, I transform the graph above into state-diagram with reward value of zero to all links except the direct link heading to the goal and the loop in the goal. See the state diagram below.

 

 

To solve this problem, we transform the state diagram into R matrix as shown in the picture below (I used MS Excel without any programming, you can download it here).

 

 

Then transform the R matrix into Q matrix and normalized it.

 

The results is given as sequence of states is shown in the figure below

 

Alternatively, you can also see the solution in the state diagram below. The red color of arrow is the optimum path (minimum path) from initial state to the goal state. Actually, the Q values in this diagram will lead to any initial state (not only from state 1) to the goal state using minimum path. Very impression solution, isn't it?

 

 

<Contents | Previous | Next >

 

 

This tutorial is copyrighted.

Preferable reference for this tutorial is

Teknomo, Kardi. 2005. Q-Learning by Examples. http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/index.html

 

 
 
© 2006 Kardi Teknomo. All Rights Reserved.
Designed by CNV Media