Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Reinforcement Learning in the Brain and Behaviour


   Department of Psychology

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Prof Elliot Ludvig, Dr Emmanouil Konstantinidis  No more applications being accepted  Competition Funded PhD Project (Students Worldwide)

About the Project

This project is available through the MIBTP programme on a competition basis. The successful applicant will join the MIBTP cohort and will take part in all of the training offered by the programme.  Please visit the MIBTP website for further details and to submit an application.

Project Detail:

Humans and other animals are very efficient at learning from rewards in their environment and choosing accordingly. A popular approach for understanding how creature make such reward-based decisions is Reinforcement Learning (RL), a formalism from computer science that has been used to create artificially intelligent agents that have proven remarkably successfully at games, such as Chess or Go, and real-world problems, like navigation and protein folding.

RL models learn through trial and error. In animals, this error term is thought to be encoded by dopamine neurons, which project widely through the brain. Through this error signal, animals can learn the values of different options and outcomes, which are then encoded in the striatum and several areas of the frontal cortex. These RL models are particularly effective at predicting reward-based behaviours, including the time course of associative learning, the transition from goal-directed to habitual behaviour, and exploration during probabilistic reward learning (e.g., Miller et al., 2019).

This project will look to apply some of the newest developments in RL as a way of understanding new aspects of human decision-making. For example, in distributional RL, animals learn about the full distribution of possible outcomes, instead of simply the average value. The dopamine system does indeed seem to encode a variety of prediction errors that span the expected distribution (Dabney et al., 2020). The full implications of such a distributional code, however, for human behaviour has not been assessed. Another potential angle is recent work enhancing RL models with episodic memories—such models allow individual instances of past outcomes to greatly influence ongoing behaviour.

On the behavioural side, most work examining human decision-making focuses on the situation where people are explicitly told about the possible odds, outcomes, and delays for the rewarding options. When people learn from experience about the rewarding options, as an RL model would, their behaviour often differs substantially from when they are told explicitly. For example, in experience, when making risky choices, people are less sensitive to rare events, but more sensitive to extreme outcomes (e.g., Madan et al., 2019; Wulff et al., 2018). This fundamental gap between choosing based on described outcomes or personal experience has thus far eluded explanation with RL models.

This project will entail computational modelling, including the simulation of existing RL models, fitting those models to behavioural and neural data (from human or other animals), creating and refining new RL models, and potentially creating and running behavioural experiments with human participants to test those models. The exact set of behaviours in question will be driven by student interest, but recent work in the lab has focused on modelling habits, curiosity, exploration, and memory-based choice.

References:

  1. Dabney, W., Kurth-Nelson, Z., Uchida, N. et al. (2020) A distributional code for value in dopamine-based reinforcement learning. Nature, 577671–675.
  2. Madan, C. R., Ludvig, E. A., & Spetch, M. L. (2019). Comparative inspiration: From puzzles with pigeons to novel discoveries with humans in risky choice. Behavioural Processes160, 10-19.
  3. Miller, K., Shenhav, A., & Ludvig, E. A. (2019). Habits without values. Psychological Review, 126, 292-311.
  4. Wulff, D. U., Mergenthaler-Canseco, M., & Hertwig, R. (2018). A meta-analytic review of two modes of learning and the description-experience gap. Psychological Bulletin144(2), 140.

BBSRC Strategic Research Priority: Understanding the Rules of Life: Neuroscience and behaviour

Techniques that will be undertaken during the project:

  • Behavioural Testing with Humans
  • Computational Modeling
  • Developing and Coding Behavioural Experiments
  • Statistical analysis of behavioural data

Contact: Professor Elliot Ludvig, University of Warwick


Psychology (31)
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

 About the Project