Approximate Methods - Deep RL

https://www.youtube.com/watch?v=lvoHnicueoE&

Previously - Approaches to RL :

we can learn explicty the model (dynamics model and reward model)
we can learn the action-value function/
we can learn the policy pi

Even if the agent has a complete and accurate environment model, the agent is typically unable to perform enough computation per time step to fully use it. The memory available is also an important constraint. Memory may be required to build up accurate approximations of value functions, policies, and models. In most cases of practical interest there are far more states than could possibly be entries in a table, and approximations must be made. So the talk i gave was on deep reinforcement learning and not just reinforcement learning. The deep part is important. Thats how you tackle real world problems… previously when deep learning wasn’t there or popular we were using methods such as: Policy Gradients etc.

Scaling up Reinforcement Learning to Real world Problems Now: (using deep learning)

Value-Based Deep RL
Policy-Based Deep RL
Model-Based Deep RL

Key Idea: Use deep neural networks as function approximator to represent:

Value function
Policy
Model

Optimise loss function by stochastic gradient descent Note: idea is simple but requires. when you bring in NN it becomes messy, Also there are no theoretical guarantees that this will converge…you have to understand and master the behaviour of NN. and then use tricks https://youtu.be/lvoHnicueoE

A Brief Survey of Deep Reinforcement Learninghttps://arxiv.org/pdf/1708.05866.pdf

PreviousModel free Learning - Estimate the value function of an unknown MDP NextOpen Problems

Last updated 2 years ago