Files to make the practice

In order to make the practice, you need to download the following archive code.zip.

The subject is here.

Introduction to the package

The practice is divided into two parts : batch learning and online learning.

Batch learning

For the algorithms :

batch_learners.LSPI([gamma]) The LSPI algorithm, with hand coded features
batch_learners.fittedQ([regressor, gamma]) The fittedQ algorithm

For the simulator :

mountain_car.simulator([max_step]) The mountain-car simulator, as described by Sutton & Barto.

For the plotting functions :

plot_tools.plot_transitions(states, actions, ...) Plot the transitions, different colors for different actions (blue for left, green for none, red for right).
plot_tools.plot_val_pol(Q, name) Plot the value function (max_a Q(s,a)) and the greedy policy for a Q-function.
plot_tools.plot_traj(traj, name) Plot the trajectory.
plot_tools.plot_perf(res, name) Plot the performance.

Online learning

For the algorithms :

online_learners.SARSA([width, height, ...]) A SARSA agent, specifically for the cliff walking problem with a tabular representation of the Q-function.
online_learners.Qlearning([width, height, ...]) A Q-learning agent, specifically for the cliff walking problem with a tabular representation of the Q-function.

For the simulator :

cliff_walking.simulator([width, height]) The cliff walking problem.

For the plotting functions :

tools_exp.learn_episode_sarsa(env, agent) The SARSA agent learns for one episode on the environment
tools_exp.learn_episode_qlearning(env, agent) The Q-learning agent learns for one episode on the environment
tools_exp.test_episode(env, agent) The agent test the greedy policy of the learnt Q-function
tools_exp.plot_traj(set_states, set_actions, ...) Plot a trajectory of the agent.
tools_exp.plot_res(res_sarsa, res_qlearning, ...) Plot the results.