Supervisor: Prof Andrew Brown
Co-supervisor: Dr Mark Vousden, Dr Graeme Bragg
POETS - Partial Ordered Event Triggered Systems - technology is based on the idea of an extremely large number (millions) of small cores, embedded in a fast, bespoke, hardware, parallel communications infrastructure - the core mesh. Inter-core communication is asynchronous, and effected by small, fixed size, hardware data packets (a few bytes) - messages. For an important set of industrial problems, POETS architectures are capable of delivering orders of magnitude speed increases at significantly lower power levels. This project is about accelerating a simulation application using POETS.
In large computing systems, system management is a vital part of the overall platform. A system management suite must provide process monitoring, hardware diagnostics, low-level debug facilities, performance monitoring, and operator control. For conventional large-scale synchronous systems, the problem has been well-studied and numerous tools exist. However, when the system is partially or completely asynchronous, a range of further considerations enter. For example, the concept of a global ’breakpoint’ does not apply because various parts of the system may independently be at different points in the computation. Control messages must account for the fact that the communication delay is significant and that the exact point when the control is injected is nondeterministic. Even a seemingly trivial problem like halting an application can be complex when there may be no way of determining when a process can be safely stopped.
One possible solution is to distribute independent control processes throughout the system that assume local responsibility for a subset of the hardware and act autonomously to provide a form of asynchronous hardware management. However, this consumes resources and occupies more communication bandwidth, so careful design will be required to make sure that system management does not interfere with the main computation. Another solution would be to make each parallel process responsible for its own management and implement a messaging system that allows a central operator to query each process. In this scenario, however, how to transform a series of local views into a global view of the system will require investigation.
In addition to being useful for asynchronous hardware, such tools may be useful for distributed systems such as sensor networks, IoT-like environments, and swarm robotics. Indeed, existing techniques from these domains might be adapted and used for asynchronous hardware.
This project will investigate suitable models for real-time management of an asynchronous parallel hardware platform. The project will work with a generic, massively parallel hardware substrate being developed at University of Cambridge and configuration tools already under development at Southampton. The goal will be to determine what methods are suitable for managing asynchronous systems in real time and to demonstrate effective techniques through the implementation of a management layer for the hardware.
This 3.5 year studentship covers UK tuition fees and provides an annual tax-free stipend at the standard EPSRC rate, which is £15,009 for 2019/20.
Applicants must be UK residents with no restrictions on how long they can stay in the UK and have lived here for at least 3 years prior to the start of the studentship. This residence cannot be mainly for the purpose of receiving full-time education.
For further guidance on funding, please contact [email protected]