My research is on reinforcement learning in brains and machines. I am particularly interested in the process of learning representations that support generalization to promote flexible, speedy RL.
Some background: Deep RL is a highly general and powerfully expressive framework, with impressive victories to its name (Atari, Go, more Go, cooling centers, to list a few of my employer‘s faves). Nevertheless, deep RL has a big problem: data efficiency. A mountain of data is required to train a deep RL agent, and the agent will remain highly specialized for the task it was trained to perform. This is because agents trained with reinforcement alone do not encode anything about the structure of the environment unless its usefulness is immediately apparent. When reward is sparse, as usually the case, agents learn slowly and discard information that could have been useful later on.
The human mind — and the mouse mind, for that matter — is comparatively frugal. We are are constantly stashing information that might be useful for unknowable future problems and identifying patterns in the information we store. For instance, if you spot sugar while looking for salt to season your eggs, you can still recall the steps that led you to the sugar when you later want sweeten your coffee. And even a mediocre human chef understands that stirring coffee is fundamentally similar to the motion of whisking eggs and can recycle shared machinery across these tasks. When we learn about the similarity structure of the world even before it is obvious what that structure might be useful for, we prepare ourselves to plan rapidly in the future by analogy with relevant episodes from the past. Endowing machines with this capability remains largely an open problem, which is one reason we solved Go before we solved amateur cooking.
So I study what kinds of representations in the brain support these analogies and work on expressing them in a mathematical form for machine learning purposes. My doctoral research has involved investigating specifically how hippocampus and entorhinal cortex jointly support this type of flexible learning and planning. In addition to following up on this research, I am now working at DeepMind to port this representational capacity over to machines.