• Post category:StudyBullet-14
  • Reading time:10 mins read


Reinforcement Learning

What you will learn

Define what is Reinforcement Learning?

Apply all what is learned using state-of-the art libraries like OpenAI Gym, StabeBaselines, Keras-RL and TensorFlow Agents

Define what are the applications domains and success stories of RL?

Define what are the difference between Reinforcement and Supervised Learning?

Define the main components of an RL problem setup?

Define what are the main ingredients of an RL agent and their taxonomy?

Define what is Markov Reward Process (MRP) and Markov Decision Process (MDP)?

Define the solution space of RL using MDP framework

Solve the RL problems using planning with Dynamic Programming algorithms, like Policy Evaluation, Policy Iteration and Value Iteration

Solve RL problems using model free algorithms like Monte-Carlo, TD learning, Q-learning and SARSA

Differentiate On-policy and Off-policy algorithms

Master Deep Reinforcement Learning algorithms like Deep Q-Networks (DQN), and apply them to Large Scale RL

Master Policy Gradients algorithms and Actor-Critic (AC, A2C, A3C)

Master advanced DRL algorithms like DDPG, TRPO and PPO

Define what is model-based RL, and differentiate it from planning, and what are their main algorithms and applications?

Description

Hello and welcome to our course; Reinforcement Learning.

Reinforcement Learning is a very exciting and important field of Machine Learning and AI. Some call it the crown jewel of AI.

In this course, we will cover all the aspects related to Reinforcement Learning or RL. We will start by defining the RL problem, and compare it to the Supervised Learning problem, and discover the areas of applications where RL can excel. This includes the problem formulation, starting from the very basics to the advanced usage of Deep Learning, leading to the era of Deep Reinforcement Learning.

In our journey, we will cover, as usual, both the theoretical and practical aspects, where we will learn how to implement the RL algorithms and apply them to the famous problems using libraries like OpenAI Gym, Keras-RL, TensorFlow Agents or TF-Agents and Stable Baselines.

The course is divided into 6 main sections:

1- We start with an introduction to the RL problem definition, mainly comparing it to the Supervised learning problem, and discovering the application domains and the main constituents of an RL problem. We describe here the famous OpenAI Gym environments, which will be our playground when it comes to practical implementation of the algorithms that we learn about.


Get Instant Notification of New Courses on our Telegram channel.


2- In the second part we discuss the main formulation of an RL problem as a Markov Decision Process or MDP, with simple solution to the most basic problems using Dynamic Programming.

3- After being armed with an understanding of MDP, we move on to explore the solution space of the MDP problem, and what the different solutions beyond DP, which includes model-based and model-free solutions. We will focus in this part on model-free solutions, and defer model-based solutions to the last part. In this part, we describe the Monte-Carlo and Temporal-Difference sampling based methods, including the famous and important Q-learning algorithm, and SARSA. We will describe the practical usage and implementation of Q-learning and SARSA on control tabular maze problems from OpenAI Gym environments.

4- To move beyond simple tabular problems, we will need to learn about function approximation in RL, which leads to the mainstream RL methods today using Deep Learning, or Deep Reinforcement Learning (DRL). We will describe here the breakthrough algorithm of DeepMind that solved the Atari games and AlphaGO, which is Deep Q-Networks or DQN. We also discuss how we can solve Atari games problems using DQN in practice using Keras-RL and TF-Agents.

5- In the fifth part, we move to Advanced DRL algorithms, mainly under a family called Policy based methods. We discuss here Policy Gradients, DDPG, Actor-Critic, A2C, A3C, TRPO and PPO methods. We also discuss the important Stable Baseline library to implement all those algorithms on different environments in OpenAI Gym, like Atari and others.

6- Finally, we explore the model-based family of RL methods, and importantly, differentiating model-based RL from planning, and exploring the whole spectrum of RL methods.

Hopefully, you enjoy this course, and find it useful.

English
language

Content

Introduction

Course introduction
Course overview

Introduction to Reinforcement Learning

Module intro and roadmap
What is RL?
What RL can do?
The RL problem setup (AREA)
Reward
RL vs. Supervised Learning
State
AREA examples and quizes
Gym Environments
Inside RL agent – RL agent ingredients
Policy
Value
Model
RL agents taxonomy
Prediction vs Control

Markov Decision Process (MDP)

Module intro and roadmap
Markov Chain and Markov Process (MP)
Markov Reward Process (MRP)
Markov Decision Process (MDP)
Prediction
Bellman Equations with action-value function Q
Control

MDP solutions spaces

Module intro and roadmap
Planning with Dynamic Programming (DP)
Prediction with DP – Policy Evaluation
Control with DP – Policy Iteration and Value Iteration
Value Iteration example
Prediction with Monte-Carlo – MC Policy Evaluation
Prediction with Temporal-Difference (TD)
TD Lambda
Control with Monte-Carlo – MC Policy Iteration
Control with TD – SARSA
On-policy vs. Off-policy
Q-learning
MDP solutions summary

Deep Reinforcement Learning (DRL)

Module intro and roadmap
Large Scale Reinforcement Learning
DNN as function approximator
Value Function Approximation
DNN policies
Value function approximation with DL encoder-decoder pattern
Deep Q-Networks (DQN)
DQN Atari Example with Keras-RL and TF-Agents

Advanced DRL

Module intro and roadmap
Value-based vs Policy based vs Actor-Critic
Policy Gradients (PG)
REINFORCE – Monte-Carlo PG
AC – Actor-Critic
A2C – Advantage Actor-Critic
A3C – Asynchronous Advantage Actor-Critic
TRPO – Trusted Region Policy Optimization
PPO – Proximal Policy Optimization
DDPG – Deep Determinstic Policy Gradients
StableBaselines library overview
Atari example with stable-baselines
Mario example with stable-baselines
StreetFighter example with stable-baselines

Model-based Reinforcement Learning

Module intro and roadmap
Model learning methods
Model learning with Supervised Learning and Function Approximation
Sample based planning
Dyna – Intergation planning and Learning

Conclusion

Conclusion

Material

Slides