Obtaining performance portability via Domain Specific Languages (DSLs) and MLIR


   School of Informatics

  , Dr Tobias Grosser  Monday, October 31, 2022  Competition Funded PhD Project (European/UK Students Only)

About the Project

Writing efficient parallel code for current generation supercomputers is difficult and in the domain of the (relatively) few experts. However this situation is set to become even more challenging as the trend for heterogeneity (i.e. use of accelerators) and scale increase significantly with next generation exascale machines. Put simply, sequential languages that we have relied upon for so long to write our parallel codes do not provide the necessary abstractions when it comes to writing parallel codes. As a community we have gotten around this by making it the job of the programmer to determine all aspects of parallelism for their code and provide their own parallel abstractions (e.g. by explicitly designing at the code level for geometric decomposition, or divide-and-conquer, or pipeline parallelism), but determining this low level and tricky detail is time consuming, requiring significant expertise, as well as not being scalable to future much larger and more complex supercomputers.

There is however another way, and that is of the use of Domain Specific Languages (DSLs). These are languages which, out of the box, provide specific abstractions to the programmer which they can then use as a basis for writing their code. The idea being that by encouraging the programmer to work within the confines of specific rules governed by the abstractions and restrictions of a specific domain, then there is a significant amount of information upon which the compiler can act to determine details that traditionally required the programmer to specify manually. In-fact the word language is a bit of a misnomer here, instead the key is abstractions as many of these technologies are embedded within existing languages such as Python.

DSLs have demonstrated their potential to play an important role in programming future exascale simulation codes, however there is a big problem! The issue is around the underlying compilation stack, where DSLs are often siloed and tend to share very little, if any, underlying infrastructure. This means that it can be costly to develop new DSLs, the underlying technology stack can be brittle, and there can be a lack of third party tools such as debuggers and profilers. But there is also a potential solution and that is of Multi Level Intermediate Representation (MLIR) which is a framework for IR that enables one to effectively lower source code to the general representation required by the LLVM compiler through a series of pre-built abstractions. There are very many existing MLIR dialects, with it being possible to write new ones too, thus enabling many different languages, abstractions, and domains to more readily integrate with the existing and mature LLVM tooling without losing information in the translation process.

Overview of research area:

DSLs sit across numerous research communities, including programming language design, compilers, and HPC. We have just started a project called xDSL which is a collaboration between Informatics and EPCC at Edinburgh, and Imperial College London. xDSL looks to develop a unified DSL ecosystem based upon MLIR, with the idea being that DSL front-ends will be able to readily integrate with our ecosystem and the appropriate MLIR dialects. Upon doing so the DSL will then benefit from the mature, and well supported, LLVM tooling whilst still being able to exploit the high level domain-specific information provided by the programmer when making important decisions around how to map to the hardware (e.g. choices around parallelism and specific accelerators). Ultimately this will significantly reduce the effort required to develop DSLs and provide a rich, well supported compilation stack with a large variety of third party tooling.

Potential research question(s)

A key potential benefit of such a DSL ecosystem is performance portability, where a single source code can, to some extent at-least, run across numerous different hardware with minimal changes required on-behalf of the programmer. Whilst this has been proven somewhat across CPU families for MLIR, when considering accelerators such as GPUs, the Cerebras CS-1, FPGAs, AI accelerators,or even novel CPUs such as RISC-V, then this objective is significantly more challenging! A key question is therefore whether one can, using MLIR and based upon the rich amount of domain specific information encoded within a single source code, target many different types of accelerator with minimal changes required to code whilst obtaining good performance. Furthermore, which programmer driven optimisations are still required in code, and how can these be best expressed within a DSL to achieve this objective?

Due to the large scope here, there is flexibility for the student to work at different levels of the computing stack and can be driven largely by their interests. For instance this includes aspects ranging from optimising the generation of binaries on specific hardware, to the compiler support required in enabling this portability, to the design of language level abstractions, and also support with third party tooling. EPCC hosts a wide range of exciting next-generation hardware that the student will be given access to as part of this project.

Student Requirements:

Note that these are the minimum requirements to be considered for admission.

A UK 2:1 honours degree, or its international equivalent, in computer science/informatics, physics, mathematics, or engineering.

You must be a strong programmer with some experience of C or C++. You must be comfortable learning new language and concepts as this will form a significant part of the first year.

English Language requirements as set by University of Edinburgh

Student Recommended/Desirable Skills and Experience

Experience in parallel programming. Experience with MLIR and/or LLVM.


Funding Notes

EPCC holds the following funding opportunities across its PhD opportunities at present for which this project is one of many eligible (i.e. competitive funding):
For entry during academic year 2022-23:
One EPSRC studentship with standard EPSRC eligibility: View Website
We also welcome applications for these projects from students with their own source(s) of funding.

References

[1] Brown, Nick. "Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs." 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021.
[2] Gysi, Tobias, et al. "Domain-specific multi-level IR rewriting for GPU: The Open Earth compiler for GPU-accelerated climate simulation." ACM Transactions on Architecture and Code Optimization (TACO) 18.4 (2021): 1-23.
[3] Chelini, Lorenzo, et al. "MultiLevel Tactics: Lifting loops in MLIR." (2020). 2020 European LLVM Developers' Meeting - Paris, France
[4] Ben-Nun, Tal, et al. "Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures." Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019.

How good is research at University of Edinburgh in Computer Science and Informatics?


Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Email Now


Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs