Safe Reinforcement Learning from Human Feedback at King’s College London on FindAPhD.com

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities

Dr Yali Du Applications accepted all year round Funded PhD Project (UK Students Only)

London United Kingdom Artificial Intelligence Machine Learning

About the Project

This project is part of a unique cohort based 4-year fully-funded PhD programme at the UKRI Centre for Doctoral Training in Safe and Trusted AI.

The UKRI Centre for Doctoral Training in Safe and Trusted AI brings together world leading experts from King’s College London and Imperial College London to train a new generation of researchers and is focused on the use of symbolic artificial intelligence for ensuring the safety and trustworthiness of AI systems.

Project Description:

Reinforcement learning (RL) has become a new paradigm for solving complex decision-making problems. However, it presents numerous safety concerns in real world decision making, such as unsafe exploration, unrealistic reward function, etc. As reinforcement learning agents are frequently evaluated in terms of rewards, it is less noticed that designing AI agents that have the capability to achieve arbitrary objectives can be deficient, in that the systems are intrinsically unpredictable and might result in negative and irreversible outcomes to humans. While humans understand the dangers, human involvement in the agent’s learning process can be promising to boost AI safety for being more aligned with human values [1].

Dr. Du’s early research [2] shows that human preference can be used as an effective replacement for reward signals. One recent attempt [1] also adopted human preference as a replacement for reward signals, to guide the training of agents in safety-critical environments; while agents query humans with a certain probability, how to actively query humans and adapt its knowledge to the task and query is not considered.

This project considers how to build safe RL agents leveraging human feedback and aims to address two challenges: 1) how to enable agents to actively query humans with efficiency thus minimising disturbance to humans; 2) how to improve algorithms’ robustness in dealing with large state space and even unseen tasks. The target of this project is to realise human value alignment safe RL in a scalable (in terms of task scale) and efficient (in terms of human involvement) way.

To address these challenges, this research will leverage the principles of the Abstract Interpretation framework [3], a theory that dictates how to obtain sound, computable, and precise finite approximations of potentially infinite sets of behaviours. Based on the abstraction of states, we aim to enable agents to build a knowledge base for (un)safe behaviours, and thus construct a scheme for when to actively query humans. Due to the nature of sequential decision making, this project will also consider temporal abstractions of behaviours and feedback to improve the consistency in safety control. Furthermore, by the effective abstractions, we aim to make the neural-network based agents invariant to task-irrelevant details, and thus generalizable to new downstream tasks.

How to Apply:

The deadline for Round A for entry in October 2023 is Monday 23rd November. See here for further round deadlines.

Committed to providing an inclusive environment in which diverse students can thrive, we particularly encourage applications from women, disabled and Black, Asian and Minority Ethnic (BAME) candidates, who are currently under-represented in the sector.

We encourage you to contact Dr Yali Du ([Email Address Removed]) to discuss your interest before you apply, quoting the project code: STAI-CDT-2023-KCL-4.

When you are ready to apply, please follow the application steps on our website here. Our 'How to Apply' page offers further guidance on the PhD application process.

Funding Notes

For more information, see Fees and Funding section here - https://safeandtrustedai.org/apply-now/

References

[1] Ilias Kazantzidis, Tim Norman, Yali Du, Christopher Freeman. How to train your agent: Active learning from human preferences and justifications in safety-critical environments. AAMAS 2022.
[2] Runze Liu, Fengshuo Bai, Yali Du, Yaodong Yang. Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning. NeurIPS 2022.
[3] Cousot, P. and Cousot, R. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Symposium on Principles of Programming Languages (POPL), 1977.