Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

An Aalternative to Expected Reward Maximization

Name

Andre Litvin

Abstract

Reinforcement learning problems can generally be described as follows. The user quantifies how good each state of some system would be according to their preferences and some agent, e.g. a robot, must choose actions that lead to states the user defined as good. More formally, for each state and action, the user picks a real-valued reward and the goal of reinforcement learning is to automatically find a strategy, called a policy, which would lead to a high reward sum. However, actions often do not determine states, but only make some states likelier than others. In this case, the policy is usually chosen by maximizing the expected reward. However, in this thesis, I prove that for every probability p < 1 and constant c > 0, there exists a reinforcement learning problem where the policy maximizing expected reward gives reward sum Z ◦ , but another policy would give reward sum Z, where P[Z > Z ◦ + c] > p. In other words, the policy maximizing expected reward can get an arbitrarily smaller reward sum with arbitrarily high probability (except 1) compared to another policy. This might not be a desirable property for a policy to have. In this thesis, I define the smoothened median of a random variable and prove that any policy that maximizes the smoothened median of the reward sum (instead of the expectation) does not have this property.

Graduation Thesis language

Estonian

Graduation Thesis type

Bachelor - Computer Science

Supervisor(s)

Raul Vicente, PhD

Defence year

2023

PDF

UT Institute of Computer Science Graduation Theses Registry

An Aalternative to Expected Reward Maximization