Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Improvements of Monte Carlo Tree Search for Real Applications


   School of Electrical Engineering, Electronics and Computer Science

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Dr F Oliehoek, Prof V-W Soo  No more applications being accepted  Funded PhD Project (Students Worldwide)

About the Project

In recent years techniques of sequential decision making have taken a flight. This is, for instance, exemplified by the AlphaGo program that beat an 18 time world champion at the game of Go [1]. Many of the recent advances are based on simulation-based (also ’sample based’) planning and, in particular, so-called Monte Carlo Tree Search (MCTS) methods [2,3].

These are very flexible: as long as one is confident enough to specify a (stochastic) simulator of the system to be controlled, the techniques enable principled decision making: in the limit they will converge to the optimal policy that takes into account execution uncertainty (e.g., stochastic outcome of actions) in a principled fashion. However, despite this flexibility, there are some domain properties that are challenging for MCTS:

1. It is difficult to deal with large number of actions, as for instance found in the control of multiagent teams [4].
2. In practice, performance depends a lot on the quality of the roll out policy, so it is important that a ’reasonable’ policy can be provided for this purpose.
3. Standard MCTS methods re-plan everything at every stage; a problem in cases where the amount of computation that can be performed for every decision epoch is limited.
4. MCTS plans for the given simulator; this means that if this simulator is incorrect (e.g., sampling from the true transition probabilities might be inherently too complex), or if the environment changes over time, performance is affected.
5. MCTS plans at a single time resolution, but in many real-world applications different types of dependent decisions might need to be made at different time scales.

The goal of this project is to explore advances in MCTS that address these difficulties in the context of two application settings: nano-grids [5] and multi-robot teams [6,7,].

For academic enquiries please contact Dr Frans Oliehoek [Email Address Removed]

For enquiries on the application process or to find out more about the Dual programme please contact Miss Hannah Fosh [Email Address Removed]


Funding Notes

This project is a part of a 4-year dual PhD programme between National Tsing Hua University (NTHU), Taiwan and the University of Liverpool, England. It is planned that students will spend time studying in each institution.

Both the University of Liverpool and NTHU have agreed to waive the tuition fees for the duration of the project. Moreover, a stipend of TWD 10,000/month will be provided to cover part of the living costs.

When applying please Quote the supervisor & project title you wish to apply for and note ‘NTHU-UoL Dual Scholarship’ when asked how you plan to finance your studies.

References

1. Silver, D., et al. “Mastering the game of go with deep neural networks and tree search.” Nature, 529:484–503, 2016
2. Levente Kocsis and Csaba Szepesvári. Bandit based Monte-Carlo planning. In Machine Learning: ECML 2006, volume 4212 of Lecture Notes in Computer Science, pages 282–293.
3. David Silver and Joel Veness. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems 23, pages 2164–2172, 2010
4. Amato, C., and Oliehoek, F. A. "Scalable Planning and Learning for Multiagent POMDPs." Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
5. Yu, W. Y., Soo, V. W., & Tsai, M. S. Power distribution system service restoration bases on a committee-based intelligent agent architecture. Engineering Applications of Artificial Intelligence, 41, 92-102. 2015.
6. Claes, D., et al. "Effective approximations for multi-robot coordination in spatially distributed tasks." Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015.
7. Claes, D., et al. “Decentralised Online Planning for Multi-Robot Warehouse Commisioning”. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. (To appear)

Where will I study?