The interaction of performance and learning in action selection
Reinforcement learning models of behaviour separate the learning and performance of actions. In these models, appropriate actions are learnt by prediction error feedback from their consequences. Actions are chosen according to their learnt values, modulated by the current balance between the desire to exploit existing knowledge or explore new options. But by controlling which actions are chosen, this exploration-exploitation trade-off must alter the course of learning. This project will explore how this interaction between performance and learning works when the explore-exploit trade-off is a function of the rate of learning.
We have good reason to believe these are coupled in the brain. A longstanding theory holds that phasic dopamine signals a prediction error. New evidence and models suggest that tonic dopamine controls the exploration-exploitation trade-off. As tonic dopamine is, to a first approximation, just the time integral of phasic dopamine, so the two are coupled.
We will use both algorithmic and neural models to study this interaction, and the role of dopamine. One goal will be to determine if the classic habit vs goal-directed distinction of instrumental behaviour is actually a performance effect and not a distinction between learning systems. Another goal will be to seek ideas for forms of directed exploration to advance the cutting edge of machine learning.
This project has a Band 1 fee. Details of our different fee bands can be found on our website. For information on how to apply for this project, please visit the Faculty of Biology, Medicine and Health Doctoral Academy website. Informal enquiries may be made directly to the primary supervisor.
Humphries, M. D., Khamassi, M. & Gurney, K. (2012) Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Frontiers in Neuroscience, 6, 9.
Khamassi, M. & Humphries, M. D. (2012) Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Frontiers in Behavioural Neuroscience, 2012, 6, 79.
Wunderlich, K., Smittenaar, P. & Dolan, R. J. (2012) Dopamine Enhances Model-Based over Model-Free Choice Behavior. Neuron, 75, 418-424