Question > A question on the 'learning' example (date: 2012-01-06)
By Jialu Pu
in the Japanese character game, why we divide the reward by its number of frequency to compute the probability of a character, i.e h/tao/sigma(t/tao) Shouldn't h*tao/sigma(h*tao) makes more sense? If we regard reward as a weight.
