Dissecting “Reinforcement Studying” by Richard S. Sutton with customized Python implementations, Episode II
In a previous post we began our collection about Reinforcement Studying (RL) following Sutton’s nice e book [1]. In that publish we launched RL generally, and mentioned Multi-armed Bandits as a nonassociative toy downside.
Right here, we are going to construct on this — however go considerably past. Specifically, we are going to introduce our first associative downside, which could really feel far more like “actual” RL to many readers — and introduce a easy however common answer method. Moreover, we are going to introduce Gymnasium [2], a strong library offering a large number of environments (e.g. Atari or MuJoCo video games) and permitting us to rapidly experiment with fixing them.
The beforehand talked about associative setting is the “customary” in RL: versus the beforehand launched nonassociative setting the place there may be solely a single state, and we solely should determine on what motion to take, right here we now have a number of states — and for each state we’d determine for a unique finest motion.