Mathematical Reinforcement Learning

Definition. Mathematical Reinforcement Learning is an approach to the study of the Reinforcement Learning problem and its associated artifacts (e.g. agents, policies, learning algorithms, and value functions) which is concerned with the mathematical structure of the relationships between the objects of Reinforcement Learning.

I have selected the term “Mathematical Reinforcement Learning” for my work to differentiate it from the work of many other mathematicians in Reinforcement Learning, commonly known as Reinforcement Learning theory, which is chiefly focused on analyzing what is possible within the Reinforcement Learning problem. Work in this field often begins with many assumptions which are rarely satisfied in practice and draw conclusions about the difficulty of such problems, often in information-theoretic terms.

That research is important, but it is not my area of interest. It is my observation and opinion that modern methods of machine learning are capable of performance far beyond that which is possible under these analyses. This discrepancy is, in my view, caused by a significant difference between learning and optimization: in practice, the optimal policy is usually not much better than the suboptimal solutions found by modern learning methods.

My emphasis is thus on understanding how and why modern methods can find such effective suboptimal policies, and on discovering and describing novel structures which elucidate the Reinforcement Learning Problem and may improve the performance of these methods.

In brief, I believe that there are non-heuristic reasons for the success of modern RL methods, and that there is room to improve these methods by accepting them as objects of study in and of themselves. Among these reasons, I believe that one of the properties that makes a problem important to people is akin to mathematical beauty: often, the kinds of problems which we seek to solve in Reinforcement Learning are concisely describable, but require a complex set of logical, symbolic, physical, and common-sense priors. For these problems, the difficulty is in producing computational systems which understand these priors.

Once we figure out how to produce such a system, I believe we will be faced with many apparently free lunches.