Reinforcement Learning in the Brain and Behaviour at University of Warwick on FindAPhD.com

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities

Prof Elliot Ludvig, Dr Emmanouil Konstantinidis No more applications being accepted Competition Funded PhD Project (Students Worldwide)

Coventry United Kingdom Neuropsychology Psychology

About the Project

This project is available through the MIBTP programme on a competition basis. The successful applicant will join the MIBTP cohort and will take part in all of the training offered by the programme. Please visit the MIBTP website for further details and to submit an application.

Project Detail:

Humans and other animals are very efficient at learning from rewards in their environment and choosing accordingly. A popular approach for understanding how creature make such reward-based decisions is Reinforcement Learning (RL), a formalism from computer science that has been used to create artificially intelligent agents that have proven remarkably successfully at games, such as Chess or Go, and real-world problems, like navigation and protein folding.

RL models learn through trial and error. In animals, this error term is thought to be encoded by dopamine neurons, which project widely through the brain. Through this error signal, animals can learn the values of different options and outcomes, which are then encoded in the striatum and several areas of the frontal cortex. These RL models are particularly effective at predicting reward-based behaviours, including the time course of associative learning, the transition from goal-directed to habitual behaviour, and exploration during probabilistic reward learning (e.g., Miller et al., 2019).

This project will look to apply some of the newest developments in RL as a way of understanding new aspects of human decision-making. For example, in distributional RL, animals learn about the full distribution of possible outcomes, instead of simply the average value. The dopamine system does indeed seem to encode a variety of prediction errors that span the expected distribution (Dabney et al., 2020). The full implications of such a distributional code, however, for human behaviour has not been assessed. Another potential angle is recent work enhancing RL models with episodic memories—such models allow individual instances of past outcomes to greatly influence ongoing behaviour.

On the behavioural side, most work examining human decision-making focuses on the situation where people are explicitly told about the possible odds, outcomes, and delays for the rewarding options. When people learn from experience about the rewarding options, as an RL model would, their behaviour often differs substantially from when they are told explicitly. For example, in experience, when making risky choices, people are less sensitive to rare events, but more sensitive to extreme outcomes (e.g., Madan et al., 2019; Wulff et al., 2018). This fundamental gap between choosing based on described outcomes or personal experience has thus far eluded explanation with RL models.

This project will entail computational modelling, including the simulation of existing RL models, fitting those models to behavioural and neural data (from human or other animals), creating and refining new RL models, and potentially creating and running behavioural experiments with human participants to test those models. The exact set of behaviours in question will be driven by student interest, but recent work in the lab has focused on modelling habits, curiosity, exploration, and memory-based choice.

References:

Dabney, W., Kurth-Nelson, Z., Uchida, N. et al. (2020) A distributional code for value in dopamine-based reinforcement learning. Nature, 577, 671–675.
Madan, C. R., Ludvig, E. A., & Spetch, M. L. (2019). Comparative inspiration: From puzzles with pigeons to novel discoveries with humans in risky choice. Behavioural Processes, 160, 10-19.
Miller, K., Shenhav, A., & Ludvig, E. A. (2019). Habits without values. Psychological Review, 126, 292-311.
Wulff, D. U., Mergenthaler-Canseco, M., & Hertwig, R. (2018). A meta-analytic review of two modes of learning and the description-experience gap. Psychological Bulletin, 144(2), 140.

BBSRC Strategic Research Priority: Understanding the Rules of Life: Neuroscience and behaviour

Techniques that will be undertaken during the project:

Behavioural Testing with Humans
Computational Modeling
Developing and Coding Behavioural Experiments
Statistical analysis of behavioural data

Contact: Professor Elliot Ludvig, University of Warwick

We’ve won a FindAMasters and FindAPhD Postgrad Award!

As a previous winner of a Postgrad Award, our university community has been recognised for their excellence in, and dedication to, postgraduate education. Studying with us means you’ll be part of a supportive environment that has been celebrated for its achievements and positive impact on our postgraduate students.

Find out more