Open Problems

Challenges: Exploration vs Exploitation, Scalability, Convergence gureentess, when markov doesn’t hold(multi-agent systems)

The first approach is to search in the space of behaviour in order to and one that performs well in the environment. The second is to use statistical techniques and dynamic programming methods to estimate the utility of taking actions in states of the world.

  1. Balancing Exploration vs Exploitation

  2. The optimal policy must be inferred by trial-and-error interaction with the environment. The only learning signal the agent receives is the reward. • The observations of the agent depend on its actions and can contain strong temporal correlations. • Agents must deal with long-range time dependencies: Often the consequences of an action only materialise after many transitions of the environment. This is known as the (temporal) credit assignment problem

  3. State Space - Scalability Challenges

  4. Theoretical Guarantees of Convergence with Approximation methods

  5. Markov assumption is held by the majority of RL algorithms, it is somewhat unrealistic, as it requires the states to be fully observable. - Multi-Agent Systems: Markov Assumotion is violated, these algorithms apparently do poor in multi-agent space…and dynamic enviornment means (state transition probabilities could be changing)

https://youtu.be/fIKkhoI1kF4

Last updated